Gradient and Hessian of loss function

calculusgradienthessianloss-functionsself-study

I'm trying to clear up the calculation of the gradient and Hessian of a loss function in an article that I am currently reading. The loss function is given by

$$\ell(\beta)=\sum_{i=1}^{N} e^{-y_{i}{{x}}^{\top}_{i} \beta}$$
where $x$, $\beta$ are vectors of the same length, say $p \times 1$ and $y_{i}=\pm 1$ Now, let $X$ denote the design matrix $X=\left[x_{1},{x}_{1},\cdots,{x}_{N} \right]^{\top}$ and $\beta$ is the coefficient vector and $\eta=X\beta$.

Then the author state that $\dot{\ell}(\beta)$, $\ddot{\ell}(\beta)$, ${\ell}^{'}(\eta)$, $\ell^{''}(\eta)$ be the gradient and Hessian of the loss function with respect to $\beta$ and $\eta$, respectively.

The author did not list what those one look like, and I am trying to obtain them, but all my calculation is pretty off here since I am not sure whether the $\eta$ should be substituted in the loss function first and then take the first and second derivative. Or should I assume that $\eta$ is a function of $X$ and use Chain rules?

Updates:

\begin{align*}
\ell(\beta)&=\sum_{i=1}^{N} e^{-y_{i}{x}^{\top}_{i} \beta}\\
\dot{\ell}(\beta)&=\frac{\partial \ell(\beta)}{\partial \beta}= -\sum_{i=1}^{N} y_{i}{x}^{\top}_{i} e^{-y_{i}{x}^{\top}_{i} \beta} \, \\
\ddot{\ell}(\beta)&= \frac{\partial^{2} \ell(\beta)}{\partial \beta^{2}}= \sum_{i=1}^{N} ( y_{i}{x}^{\top}_{i})( y_{i}{x}^{\top}_{i})^{\top} e^{-y_{i}{x}^{\top}_{i} \beta} \\
\end{align*}

\begin{align*}
\ell(\eta)&=\sum_{i=1}^{N} e^{-y_{i}{x}^{\top}_{i} \eta}\\
{\ell}^{'}(\eta)&=\frac{\partial \ell(\eta)}{\partial \eta}= -\sum_{i=1}^{N} y_{i}{x}^{\top}_{i} e^{-y_{i}{x}^{\top}_{i} \eta}=-\sum_{i=1}^{N} y_{i}{x}^{\top}_{i} e^{-y_{i}{x}^{\top}_{i} \eta} \, \\
\ell^{''}(\eta)&= \frac{\partial^{2} \ell(\eta)}{\partial \eta^{2}}= \sum_{i=1}^{N} ( y_{i}{x}^{\top}_{i})^{2} e^{-y_{i}{x}^{\top}_{i} \eta}=\sum_{i=1}^{N} ( y_{i}{x}^{\top}_{i})^{2} e^{-y_{i}{x}^{\top}_{i} \eta} \\
\end{align*}

Updates:

Suppose I have the following.

$\boldsymbol{y}=\left[\begin{array}{c}
y_{1} \\
y_{2} \\
y_{3} \\
\cdot \\
\cdot \\
\cdot \\
y_{N}
\end{array}\right]_{N \times 1}, \boldsymbol{X}=\left[\begin{array}{cccccc}
x_{1,1} & x_{1,2} & . & . & x_{1, p} \\
x_{2,1} & x_{2,2} & \cdot & \cdot & \cdot \\
x_{3,1} & x_{3,2} & \cdot & \cdot & \cdot \\
\cdot & \cdot & \cdot & \cdot & \cdot \\
\cdot & \cdot & \cdot & \cdot & \cdot \\
x_{n, 1} & x_{n, 2} & \cdot & \cdot & x_{N, p}
\end{array}\right]_{N \times p}$
$\boldsymbol{\beta}=\left[\begin{array}{c}
\beta_{1} \\
\beta_{2} \\
\beta_{3} \\
\cdot \\
\cdot \\
\dot{\beta}_{p}
\end{array}\right]_{p \times 1}$

Constructing $\eta_{N \times 1}=\boldsymbol{X}\boldsymbol{\beta}$.

Now, $-y_{i}{{x}}^{\top}_{i} \eta=-y_{i}{{x}}^{\top}_{i} \boldsymbol{X}\boldsymbol{\beta}$. But, the dimensions do not match one is $1 \times p$ and the other is $N \times 1$. What am I missing in here!!

Thank you!

Best Answer

You have a loss function that compares $y_i$ with predictions $\eta_i$

$$\ell(\eta) =\sum_{i=1}^{N} e^{-y_{i}\eta_i}$$

you can rewrite this in terms of the vector $\beta$ which is a set of parameters to express the predictions as $$\eta_i = {{x}}^{\top}_{i} \beta$$

which becomes

$$\ell(\beta) =\sum_{i=1}^{N} e^{-y_{i}{{x}}^{\top}_{i} \beta}$$

For a given vector $\eta$ you can compute how $\ell(\eta)$ changes as function of the change in the vector $\eta$.

For a given vector $\beta$ you can compute how $\ell(\beta)$ changes as function of the change in the vector $\beta$.

Explicit example. Let $$X = \begin{bmatrix}x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix}$$

and

$$\beta = \begin{bmatrix}\beta_1\\ \beta_2 \end{bmatrix}$$

then

$$\eta = \begin{bmatrix}x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix}\cdot \begin{bmatrix}\beta_1\\ \beta_2 \end{bmatrix} =\begin{bmatrix}x_{11} \beta_1+ x_{12} \beta_2 \\ x_{21} \beta_1+ x_{22} \beta_2\\ x_{31} \beta_1+ x_{32}\beta_2 \\ \end{bmatrix}$$

$$\begin{array}{rcccccccl} \ell(\eta_1,\eta_2,\eta_3) &=& e^{-y_1\eta_1}& +& e^{-y_2\eta_2} &+ &e^{-y_3\eta_3} \\&=& e^{-y_1(x_{11} \beta_1+ x_{12} \beta_2)}& + &e^{-y_2(x_{21} \beta_1+ x_{22} \beta_2)}& + &e^{-y_3(x_{31} \beta_1+ x_{32}\beta_2)}& = &\ell(\beta_1,\beta_2)\end{array} $$

Best Answer

Related Solutions

Solved – Gradient of Hinge loss

Solved – Gradient for hinge loss multiclass

Related Question