Derivative of a scalar function with respect to vector input

derivativesmaxima-minimaoptimizationvectors

Consider the function
\begin{align*}
&\phi: \mathbb{R}^m \to \mathbb{R} \\
&\phi: x \mapsto \frac{1}{2} |Ax|^2 + f(x).
\end{align*}

Note that $f$ is again a scalar function of $x$, and $A$ is an $m \times m$ matrix.

I am attempting to maximise $\phi$ w.r.t. $x$. From https://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf, I get that $\frac{\partial f}{\partial x}$ is a row vector as $x$ is a column vector. However, differentiating $\frac{1}{2} |Ax|^2$, I get $A^T A x$. This is obviously a column vector. This means that the dimensions of the derivatives of the two terms involved in $\phi$ wouldn't match. This seems to suggest I cannot simply set $\frac{\partial \phi}{\partial x} = 0$ in order to maximise. (We assume that $\phi$ is concave.)

I was wondering how this problem can be overcome? Thanks for reading.

Best Answer

The derivative you seek is the vector whose $i$th component is $\frac{\partial\phi}{\partial x_i}$, and whether it's a column or row vector is an arbitrary conventional choice. You've differentiated two terms with incompatible conventions, so just transpose one and you'll be fine. If you stick by $\frac{\partial f}{\partial x}$ being a column vector, the "derivative" you want to take of $\tfrac12x^TA^TAx$ is $x^TA^TA$, not $A^TAx$.

Related Question