Training model trees on data streams with missing values
O. Parisot, Y. Didry, T. Tamisier, and B. Otjacques
in 4th International Conference, DATA 2015, Colmar, France, July 20-22, 2015, Data Management Technologies and Applications, Revised Selected Papers of DATA 2015, Springer, pp. 81-97, 2016
Model trees combine the interpretability of decision trees with the efficiency of multiple linear regressions making them useful in dynamically attaining predictive analysis on data streams. However, missing values within the data streams is an issue during the training phase of a model tree. In this article, we compare different approaches to deal with incomplete streams in order to measure their impact on the resulting model tree in terms of accuracy. Moreover, we propose an online method to estimate and adjust the missing values during the stream processing. To show the results, a prototype has been developed and tested on several benchmarks.