Solved – derivative of loss function

derivativeloss-functionsmaximum likelihood

If we have a paralyzed loss function of the form of:
\begin{align}
L(\beta)& =\frac{1}{2}(y-X\beta)^T(y-X\beta)+
\lambda \beta^T f(\beta)
\end{align}
where $X_{n\times m}$ and $\beta_{m \times 1}$ and $f$ is considered as a column vector. Then what is the derivative of this function with respect to $\beta$. Note that it is derivative with respect to a vector.
\begin{align}
\frac{\partial L(\beta)}{\partial \beta}& =\frac{\partial}{\partial \beta}\bigg(\frac{1}{2}(y-X\beta)^T(y-X\beta)+
\lambda \beta^T f(\beta)\bigg)
\end{align}

Sorry for the silly question. I just confused myself.

Best Answer

Reference

From (84):

$$ \begin{align} \frac{\partial}{\partial \beta}\bigg((y-X\beta)^T(y-X\beta)\bigg) & = -2 X^T(y-X\beta) \end{align} $$

And using (93):

$$ \begin{align} \frac{\partial}{\partial \beta}\bigg(\lambda\beta^Tf(\beta)\bigg) &= \lambda \bigg( {\bigg[ \frac{\partial \beta}{\partial \beta} \bigg]}^Tf(\beta) + \bigg[ \frac{\partial f(\beta)}{\partial \beta}\bigg]_{m\times m}^T\beta\bigg) \\ &= \lambda\bigg(I_{m \times m}f(\beta)+\bigg[ \frac{\partial f(\beta)}{\partial \beta}\bigg]_{m\times m}^T\beta_{m \times 1}\bigg) \end{align} $$

Related Question