Ordinary Least Squared derivation using Index Notation

index-notationmatrix-calculus

The following function in $\mathbf{x}$ is given:
$$f(\mathbf{x}) = \frac{1}{2}\left\lVert \mathbf{Ax – b} \right\rVert_2^2$$

I want to calculate $\frac{\mathrm{d}f(\mathbf{x})}{\mathrm{d}\mathbf{x}}$ using index notation, where the conversion from linear algebra notation should happen immediately (i.e. no expansion of the square using $\left\lVert \mathbf{x} \right\rVert_2^2 = \mathbf{x}^\intercal\mathbf{x}$). Therefore the above is rewritten as $$ f(\mathbf{x}) = \frac{1}{2} (A_{ij}x_j – b_i)^2. $$ Using the chain rule, I immediately arrive at $$ \frac{\mathrm{d}f(\mathbf{x})}{\mathrm{d}\mathrm{x}_l} = 2\left(\frac{1}{2}(A_{ij}x_j-b_i)A_{ij}\delta_{jl}\right) = A_{il}^2x_l-A_{il}b_i, $$ or, converted to LA notation $$\frac{\mathrm{d}f(\mathbf{x})}{\mathrm{d}\mathbf{x}} = \mathbf{A}\mathbf{A}\mathbf{x}-\mathbf{A}\mathbf{b} $$ which is of course the wrong solution. I am fairly confident with matrix calculus using matrix notation, but I cant wrap my head around where the needed transpose would come from using the index notation (one can see that the result would be correct if the latter $A_{ij}$ in the above were $A_{ji}$, i.e. the transpose).

Best Answer

Since $f=\frac{1}{2}(A_{ij}x_j-b_i)(A_{ik}x_k-b_i)$, $\partial_l f=A_{il}(A_{ik}x_k-b_i)$. Contracting out the repeated index $i$ by the definition of matrix multiplication requires $A_{il}$ to be written as $A^T_{li}$, so $\partial_l f = A^T_{li}(Ax-b)_i=(A^TAx-A^T b)_l$.