[Math] Machine Learning, why not use matrix multiplication instead of gradient descent

descentmachine learningmatricesregression

If we want to minimize our Cost function for a given set of data, why do we use gradient descent and continually guess values until we find a min value for theta when when can just use matrix multiplication to solve for theta with the equation:

$$a = (M^T M)^{-1} M^T y$$

where ${}^T$ means "transpose" and $a$ is the column matrix of Theta values.

Best Answer

The quick answer is that for optimization problems over big matrices, matrix-inversion can be very computationally expensive, and so avoiding inversion becomes an important time-saving and space-saving measure. For a sufficiently small set of data, you might be better off directly finding $(M^T M)^{-1}$.

Related Question