I'm trying to clear up the calculation of the gradient and Hessian of a loss function in an article that I am currently reading. The loss function is given by
$$\ell(\beta)=\sum_{i=1}^{N} e^{-y_{i}{{x}}^{\top}_{i} \beta}$$
where $x$, $\beta$ are vectors of the same length, say $p \times 1$ and $y_{i}=\pm 1$ Now, let $X$ denote the design matrix $X=\left[x_{1},{x}_{1},\cdots,{x}_{N} \right]^{\top}$ and $\beta$ is the coefficient vector and $\eta=X\beta$.
Then the author state that $\dot{\ell}(\beta)$, $\ddot{\ell}(\beta)$, ${\ell}^{'}(\eta)$, $\ell^{''}(\eta)$ be the gradient and Hessian of the loss function with respect to $\beta$ and $\eta$, respectively.
The author did not list what those one look like, and I am trying to obtain them, but all my calculation is pretty off here since I am not sure whether the $\eta$ should be substituted in the loss function first and then take the first and second derivative. Or should I assume that $\eta$ is a function of $X$ and use Chain rules?
Updates:
\begin{align*}
\ell(\beta)&=\sum_{i=1}^{N} e^{-y_{i}{x}^{\top}_{i} \beta}\\
\dot{\ell}(\beta)&=\frac{\partial \ell(\beta)}{\partial \beta}= -\sum_{i=1}^{N} y_{i}{x}^{\top}_{i} e^{-y_{i}{x}^{\top}_{i} \beta} \, \\
\ddot{\ell}(\beta)&= \frac{\partial^{2} \ell(\beta)}{\partial \beta^{2}}= \sum_{i=1}^{N} ( y_{i}{x}^{\top}_{i})( y_{i}{x}^{\top}_{i})^{\top} e^{-y_{i}{x}^{\top}_{i} \beta} \\
\end{align*}
\begin{align*}
\ell(\eta)&=\sum_{i=1}^{N} e^{-y_{i}{x}^{\top}_{i} \eta}\\
{\ell}^{'}(\eta)&=\frac{\partial \ell(\eta)}{\partial \eta}= -\sum_{i=1}^{N} y_{i}{x}^{\top}_{i} e^{-y_{i}{x}^{\top}_{i} \eta}=-\sum_{i=1}^{N} y_{i}{x}^{\top}_{i} e^{-y_{i}{x}^{\top}_{i} \eta} \, \\
\ell^{''}(\eta)&= \frac{\partial^{2} \ell(\eta)}{\partial \eta^{2}}= \sum_{i=1}^{N} ( y_{i}{x}^{\top}_{i})^{2} e^{-y_{i}{x}^{\top}_{i} \eta}=\sum_{i=1}^{N} ( y_{i}{x}^{\top}_{i})^{2} e^{-y_{i}{x}^{\top}_{i} \eta} \\
\end{align*}
Updates:
Suppose I have the following.
$\boldsymbol{y}=\left[\begin{array}{c}
y_{1} \\
y_{2} \\
y_{3} \\
\cdot \\
\cdot \\
\cdot \\
y_{N}
\end{array}\right]_{N \times 1}, \boldsymbol{X}=\left[\begin{array}{cccccc}
x_{1,1} & x_{1,2} & . & . & x_{1, p} \\
x_{2,1} & x_{2,2} & \cdot & \cdot & \cdot \\
x_{3,1} & x_{3,2} & \cdot & \cdot & \cdot \\
\cdot & \cdot & \cdot & \cdot & \cdot \\
\cdot & \cdot & \cdot & \cdot & \cdot \\
x_{n, 1} & x_{n, 2} & \cdot & \cdot & x_{N, p}
\end{array}\right]_{N \times p}$
$\boldsymbol{\beta}=\left[\begin{array}{c}
\beta_{1} \\
\beta_{2} \\
\beta_{3} \\
\cdot \\
\cdot \\
\dot{\beta}_{p}
\end{array}\right]_{p \times 1}$
Constructing $\eta_{N \times 1}=\boldsymbol{X}\boldsymbol{\beta}$.
Now, $-y_{i}{{x}}^{\top}_{i} \eta=-y_{i}{{x}}^{\top}_{i} \boldsymbol{X}\boldsymbol{\beta}$. But, the dimensions do not match one is $1 \times p$ and the other is $N \times 1$. What am I missing in here!!
Thank you!
Best Answer
You have a loss function that compares $y_i$ with predictions $\eta_i$
$$\ell(\eta) =\sum_{i=1}^{N} e^{-y_{i}\eta_i}$$
you can rewrite this in terms of the vector $\beta$ which is a set of parameters to express the predictions as $$\eta_i = {{x}}^{\top}_{i} \beta$$
which becomes
$$\ell(\beta) =\sum_{i=1}^{N} e^{-y_{i}{{x}}^{\top}_{i} \beta}$$
For a given vector $\eta$ you can compute how $\ell(\eta)$ changes as function of the change in the vector $\eta$.
For a given vector $\beta$ you can compute how $\ell(\beta)$ changes as function of the change in the vector $\beta$.
Explicit example. Let $$X = \begin{bmatrix}x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix}$$
and
$$\beta = \begin{bmatrix}\beta_1\\ \beta_2 \end{bmatrix}$$
then
$$\eta = \begin{bmatrix}x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix}\cdot \begin{bmatrix}\beta_1\\ \beta_2 \end{bmatrix} =\begin{bmatrix}x_{11} \beta_1+ x_{12} \beta_2 \\ x_{21} \beta_1+ x_{22} \beta_2\\ x_{31} \beta_1+ x_{32}\beta_2 \\ \end{bmatrix}$$
$$\begin{array}{rcccccccl} \ell(\eta_1,\eta_2,\eta_3) &=& e^{-y_1\eta_1}& +& e^{-y_2\eta_2} &+ &e^{-y_3\eta_3} \\&=& e^{-y_1(x_{11} \beta_1+ x_{12} \beta_2)}& + &e^{-y_2(x_{21} \beta_1+ x_{22} \beta_2)}& + &e^{-y_3(x_{31} \beta_1+ x_{32}\beta_2)}& = &\ell(\beta_1,\beta_2)\end{array} $$