Solved – Incremental learning methods in R

machine learningonline-algorithmsr

I am looking for some libraries in R that can do incremental learning (also called online or sequential learning). The use case of such learning in comparison to traditional batch methods would be to process large amounts of data. Such practices include streams and data from sensors, where it is not feasible to use always the same model or to rebuild the model from scratch every time. Any machine learning algorithm that can use only single new example to change the model would suffice. However, the model itself must not hold on to old data (as you can imagine it would soon get too big), instead just calculating some statistics about data.

For multivariate regression, online approach like Stochastic gradient descent would be a good option. For regression / model trees something like this article comes to mind. I am looking for such library where relatively good prediction accuracy (with respect to traditional batch methods) could be achieved based on the evolving model.

Best Answer

I'd suggest starting out by taking a look at MOA (Massive Online Analysis) from the University of Waikato in New Zealand. This is the same group behind Weka. (As an aside both Moa and Weka are New Zealand native species.... though the former is now extinct...)

https://moa.cms.waikato.ac.nz/

"MOA is the most popular open source framework for data stream mining, with a very active growing community (blog). It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems."

There is an R wrapper but I've not tried it; based on Git history it may be a bit out of date. COre MOA is actively maintained.

HTH Chris (from New Zealand...)

Related Question