Introduction to Statistical Learning Eq. 6.12 and 6.13
lassomachine learningridge regression
Can someone please explain me how the optimization of 6.12 leads to 6.14 and the optimization of 6.13 leads to 6.15?
Best Answer
For the first equation, it's the result of zero gradient;
$$
\begin{aligned}
S &= \sum_{j=1}^p (y_j-\beta_j)^2 +\lambda\sum_{j=1}^p\beta_j^2\\
\end{aligned}
$$
at extrema,
$$
\begin{aligned}
\frac{\partial S}{\partial \beta_j} &=0\\
-2(y_j -\beta_j) +2\lambda\beta_j &= 0\\
\beta_j &= \frac{y_j}{1+\lambda}.
\end{aligned}
$$
I think you should be able to derive the other expression using the same technique shown above and use the fact that
$$
\vert \beta_j \vert = \begin{cases} \beta_j \ \text{if} \ \beta_j > 0\\ -\beta_j \ \text{if} \ \beta_j < 0\end{cases}.
$$
The first principal component direction of the data is that along which the observations vary the most
This is referring to the projections of the data onto that line, sort of the variance explained by that line.
I think you might be interpreting this something like:
The first principal component direction is the dimension on which the residuals vary the most
which is really the opposite statement: the residuals are the deviations from that line, which is the variance of everything that is not the first principal component.
The first principal component is the one that captures as much of the variance as possible along that dimension.
Best Answer
For the first equation, it's the result of zero gradient; $$ \begin{aligned} S &= \sum_{j=1}^p (y_j-\beta_j)^2 +\lambda\sum_{j=1}^p\beta_j^2\\ \end{aligned} $$ at extrema, $$ \begin{aligned} \frac{\partial S}{\partial \beta_j} &=0\\ -2(y_j -\beta_j) +2\lambda\beta_j &= 0\\ \beta_j &= \frac{y_j}{1+\lambda}. \end{aligned} $$
I think you should be able to derive the other expression using the same technique shown above and use the fact that $$ \vert \beta_j \vert = \begin{cases} \beta_j \ \text{if} \ \beta_j > 0\\ -\beta_j \ \text{if} \ \beta_j < 0\end{cases}. $$