Solved – Regularization strength and problem size

regularization

Let's say I run an Ordinary Least Square regression with a Ridge regression on 100.000 points randomly sampled from a huge dataset. The best regularization strength found is C=1.

What is approximately the optimal regularization strength I can expect if I run the same algorithm on 1.000.000 points from the same dataset ?

Are there general rules that link the optimal regularization strength and the problem size ? Do these rules rely on statistical assumptions ? / What is their robustness ?

Thanks

Best Answer

In general, if you multiply the number of data points, $n$, by $s$ while leaving the number of predictors, $p$, unchanged, then $\|X\beta-y\|_{2}^{2}$ will increase by roughly a factor of $s$, while $\| \beta \|_{2}^{2}$ won't change much. To keep the same balance between the misfit term and the regularization term, you'll have to increase the regularization parameter by a factor of $s$ (or $\sqrt{s}$ if your regularization parameter is squared in the objective function.)

Although this is a good rule of thumb and a reasonable starting point for searching for a regularization parameter, you generally shouldn't just set the parameter by this rule of thumb. Rather, you should use whatever method you're using to select the regularization parameter on the larger data set.