Online Learning from data streams via decentralized and asynchronous SGD

Auteurs

Tosi M.D.L., Theobald M.

Référence

Future Generation Computer Systems, vol. 175, art. no. 108052, 2026

Description

Online Learning (OL) is a sub-field of Machine Learning (ML) which focuses on solving time-sensitive problems through iterative learning from data streams. This emerging field is characterized by the challenge of concept drifts, where the underlying distribution of the incoming data values evolves over time. Traditional OL algorithms, while efficient and less resource-intensive than conventional ML methods, often fall short in solving non-linear, high-dimensional problems. This prevalent gap has recently led to the integration of Artificial Neural Networks (ANN) into OL settings. These models support real-time inference. However, because they rely on offline training, their performance often degrades during or shortly after concept drifts. In this paper, we extend TensAIR, an online stream-processing engine that we specifically designed for the distributed training of ANN models. Our extensions allow TensAIR to automatically identify concept drifts using the OPTWIN drift detector algorithm, triggering the retraining of the ANN models as soon as drifts are detected. Additionally, we propose a novel decentralized and asynchronous stochastic gradient descent (DASGD) algorithm, which is central to TensAIR's performance improvements over existing methods, and we formally prove its convergence under the specified conditions. We assessed TensAIR both in single-server and HPC settings, evaluating its distributed training performance over various multi-CPU and multi-GPU scenarios. As result, we show TensAIR to converge within the best known theoretical bounds while achieving up to 78× higher sustainable throughput than state-of-the-art baselines. Based on our results, we expect to inspire further research and applications exploiting the distributed training of ANN models in HPC platforms for a wide range of OL settings.

Lien

doi:10.1016/j.future.2025.108052

Partager cette page :