Let's say I have a logistic regression classifier. In normal batch learning, I'd have a regularizer term to prevent overfitting and keep my weights small. I'd also normalize and scale my features.
In an online learning setting, I'm getting a continuous stream of data. I do a gradient descent update with each example and then discard it. Am I supposed to use feature scaling and regularization term in online learning? If yes, how can I do that? For example, I don't have a set of training data to scale against. I also don't have validation set to tune my regularization parameter. If no, why not?
In my online learning, I get a stream of examples continuously. For each new example, I do a prediction. Then in the next time step, I get the actual target and do the gradient descent update.
Best Answer
The open-source project vowpal wabbit includes an implementation of online SGD which is enhanced by on the fly (online) computation of 3 additional factors affecting the weight updates. These factors can be enabled/disabled by their respective command line options (by default all three are turned on, the
--sgd
option, turns them all off, i.e: falls-back on "classic" SGD).The 3 SGD enhancing options are:
--normalized
updates adjusted for scale of each feature--adaptive
uses adaptive gradient (AdaGrad) (Duchi, Hazan, Singer)--invariant
importance aware updates (Karampatziakis, Langford)Together, they ensure that the online learning process does a 3-way automatic compensation/adjustment for:
The upshot is that there's no need to pre-normalize or scale different features to make the learner less biased and more effective.
In addition, vowpal wabbit also implements online regularization via truncated gradient descent with the regularization options:
--l1
(L1-norm)--l2
(L2-norm)My experience with these enhancements on multiple data-sets, was that they significantly improved model accuracy and smoother convergence when each of them was introduced into the code.
Here are some academic papers for more detail related to these enhancements: