Solved – How to choose the right number of parameters in Logistic Regression

machine learningregressionregression-strategies

I am studying Andrew Ng's Machine Learning lecture notes. I understand either we can manually choose the number of parameters, or we can use regularization to make it correctly fit.

enter image description here

I was wondering are there any basic rules for choosing the right number of parameters? Can anyone give any explanation please?

Best Answer

Distortion of statistical properties can occur when you "fit to the data", so I think of this more in terms of specifying the number of parameters that I can afford to estimate and that I want to devote to the portion of the model that pertains to that one predictor. I use regression splines, place knots where $X$ is dense, and specify the number of knots (or the number of parameters and back calculate the number of knots) by asking (1) what does the sample size and distribution of $Y$ support and (2) what is the signal:noise ratio in this dataset. When $n \uparrow$ or signal:noise ratio $\uparrow$ I can use more knots. There is no set formula for the number of parameters that should be fitted, although in a minority of situations you can use cross-validation or AIC to determine this. As you mentioned, shrinkage is a great alternative, because you can start out with many parameters then shrink the coefficients down to what cross-validation or effective AIC dictate.