Regression – Impact of Ill-Conditioned Problem on SGD

gradient descentloss-functionsmulticollinearityoptimizationregression

By ill-conditioned regression problem, I mean that the feature matrix $X$ is not full rank. For example, X contains two or more columns highly correlated. If that's the case, $X^T\cdot X$ is not invertible, which means we cannot compute the closed-form solution.

My understanding is that we cannot perform Stochastic gradient descent either. However, I don't know the proper explanation. Does that mean that the loss landscape is not convex? and what would happen if we apply the SGD? Would the algorithm converge to a local minimum?

Best Answer

SGD has nothing to do with this problem. No estimation method can resolve this issue because a singular $X^\top X$ means the solution to the system of equations is not unique.

SGD converges at a rate proportional to the condition number of the inverse of your Hessian matrix (which in your case does not exist). This means your condition number is somewhere very close to zero, and so is your convergence rate.

Related Question