[Math] Linear regression using gradient descent: is the whole weight vector updated with the same number

linear regressionregression

I'm using gradient descent with mean squared error as error function to do linear regression. Take a look at the equations first.

As you can see in eq.1, the prediction is done with a bias term b and a weight vector W. Eq.2 shows the error function (MSE) while eq.3 shows the partial derivatives used to update the weights (eq.4). My question is, should all the weights in the weight vector be updated each iteration by the same number? It seems like eq.3 should return a single number; not a vector.

Best Answer

Equations are usually used in this form when you're working in a neural network kind setting, where the bias term is also a vector.

In the case of linear regression, since the bias term is a single scalar, a more intuitive way to look at these equations is to treat the summation like $\sum_{j=0}^{p}W_jx_{ij}$ by treating $x_{i0}$ as 1, and b as $W_0$.

Programmatically speaking, this translates to padding your data matrix with a column of 1's to the left. You'll find this being done often in a bunch of tutorials that walk you through linear regression!

This gets rid of the second gradient equation and reduces it to only updating W with each iteration. Hope this helps :)

Best Answer

Related Solutions

[Math] Derivation of SSE gradient for Linear Regression

[Math] Fitting a circle to a point set with gradient descent algorithm

Related Question