Solved – Estimating standard error of parameters of linear model fitted using gradient descent

gradient descentleast squaresregressionstandard error

Given a linear model

$$y = X\beta + \epsilon$$

we can estimate parameters $\hat{\beta}$ using two different ways – ordinary least squares (OLS) and gradient descent (GD). Both of them boil down to minimizing mean squared error (MSE) by finding its global minimum. The difference is that while OLS finds exact solution, GD iteratively approaches it, but may never find exact answer.

For OLS we have usual set of parameter estimates, most notably standard error $SE(\hat{\beta})$. But in some cases OLS is not an option (e.g. data matrix is too large), so we have to use GD.

I'm trying to figure out:

  1. Does it make sense at all to apply SE to parameters learned using gradient descent?
  2. If so, how do we calculate it? Do other dependent things like t-statistic and significance test take the usual form?
  3. What about stochastic gradient descent (SGD)? Is there any hope to assess its parameters?

For common reference:

Best Answer

I found that bootstrap gives estimates that are pretty close to those from OLS, but works with literally any training algorithm.

Bootstrap is a kind of Monte Carlo method and roughly boils down to repeated sampling with replacement from original dataset and collecting values of a target statistic. Having a set of statistic values, it becomes trivial to calculate their mean and standard error. G. James et al. provide experimental evidence of closeness of OLS and bootstrap results. Without further explanation, I'm giving a link to their excellent work (see pages 187-190 for bootstrap explanation and 195-197 for experiments):

Related Question