Solved – Equivalence of neural networks to linear regression

Are neural networks equivalent to linear regression if the activation function is linear (g(x) = x), and back propagation is basically just SGD for a least squares problem? Or is that only true for single layer neural networks?

I'm very new to neural networks and I basically have very little idea what's going on, so any intuition anyone could give would be appreciated. Thanks!

Edit: I'm going to write some math to establish some things.

Assume that I have a simple network with two hidden layers, and each hidden layer has K units. My input has M features, and my output is one of P classes.

let's just consider 1 sample for now.

Input vector: $x$, of length M

Hidden layer 1: $a_1 = \phi(W_1 [x;1])$ where $W_1$ is a K x (M+1) matrix and $a_1$ is the first activation vector of length K. All the activation functions are the same, and are denoted $\phi(x)$.

Hidden layers 2 … K: $a_k = \phi(W_k [a_{k-1};1])$ where $W_k$ is a K x (K+1) matrix and $a_k$ is the $k$th activation vector of length K.

Output: $y = \phi(W_{K+1}[a_K;1])$ where $W_{K+1}$ is a P x (K+1) matrix and $y$ is the output of length P.

So, as one involved function, it's

$y = \phi(W_{K+1}[\phi(W_K [\phi(W_{K-1} … [\phi(W_1 [x;1]);1] … ; 1] ;1])$.

Clearly this is a very nonlinear and nonconvex function of the $W_k$s, as pointed out in the comments, so indeed it cannot be like linear regression.

Best Answer

Yes they are equivalent, but more expensive to compute and train. The thing is that combining linear regressions in a linear fashion does not change anything. I'm not absolutely sure that backprop gets you the exact same result (I'd think it would), but it can certainly perform no better in that particular setting than ordinary regression.

Best Answer

Related Solutions

Solved – Non-linearity before final Softmax layer in a convolutional neural network

Solved – Difference between linear regression and neural network

Related Question