Solved – the difference between Online Learning Algorithms and Streaming Learning Algorithms

online-algorithms

From my understanding, Online Learning, as opposed to Batch Learning, takes actions at each time step instead of accumulating computation over entire (or a large epoch) dataset; Streaming Learning refers to having limited time and space constraints, as well as number of passes run on data.

From my professor's lecture slides, their main differences are:

------------------------------------------------------------------------------
|          Online           |                    Streaming                    |
------------------------------------------------------------------------------
| Endless data stream       | Stream of (known and typically big) length of N |
------------------------------------------------------------------------------
| Fixed amount of memory    |              Memory available is o(N)           | 
------------------------------------------------------------------------------
| Tested at every time step |         Tested only once at the very end        |
------------------------------------------------------------------------------
| Each point seen only once |        More than one pass may be possible       |
 ------------------------------------------------------------------------------

And from wikipedia Streaming Algorithm entry:

These algorithms have many similarities with online algorithms since they both require decisions to be made before all data are available, but they are not identical. Data stream algorithms only have limited memory available but they may be able to defer action until a group of points arrive, while online algorithms are required to take action as soon as each point arrives.

I still fail to understand how they are fundamentally different: Online Algorithm has pretty limited time, space and number of passes, which is really similar to Streaming Algorithms.

Best Answer

An example of an online algorithm application would be investing, where you have to reconsider the investment decision upon the arrival of new data. You cannot defer this decision and you cannot revise old decisions. I am not familiar with streaming algorithms, but it seems (based on your professor's slides) that data compression is a fair example where a file can be scanned multiple times to determine an approximately optimal compression codebook, but it is still infeasible (because of memory requirements) to analyze the entire file offline.

Related Question