For typing convenience, define the following symmetric matrices
$$\eqalign{
A &= -Lyy^TL = A^T \\
V &= H^{-1} = V^T \\
}$$
The main problem with your analysis is that the quantity $\left(\frac{\partial H^{-2}}{\partial w}\right)$ is a third-order tensor, so it cannot possibly be equal to $-2H^{-3}$ as you've assumed.
However, the differential of a matrix is just another matrix, and is much easier to work with than a third-order tensor.
Let's start with the differential of the inverse, and then its square.
$$\eqalign{
I &= HV \\
0 &= dH\,V + H\,dV \\ 0 &= V\,dH\,V+dV \\
dV &= -V\,dH\,V \\
\\
V^2 &= V\,V\\
dV^2 &= dV\,V + V\,dV \\ &= -(V\,dH\,V^2+V^2dH\,V) \\
}$$
Next calculate the differential and gradient of the objective function.
$$\eqalign{
f &= y^TLH^{-2}Ly \\&= Lyy^TL:V^2 \\&= -A:V^2 \\
df &= -A:dV^2 \\
&= +A:(V\,dH\,V^2+V^2dH\,V) \\
&= (VAV^2:dH) + (V^2AV:dH) \\
&= V(VA+AV)V:dH \\
}$$
At this point, note that
$$\eqalign{
H &= L + \operatorname{Diag}(w) \\
dH &= \operatorname{Diag}(dw) \\
}$$
and substitute to obtain
$$\eqalign{
df &= V(VA+AV)V:{\rm Diag}(dw) \\
&= {\rm diag}\Big(V(VA+AV)V\Big):dw \\
\frac{\partial f}{\partial w}
&= {\rm diag}\Big(V(VA+AV)V\Big) \\
&= -{\,\rm diag}\Big(V(VLyy^TL+Lyy^TLV)V\Big) \\
&= -{\,\rm diag}\Big(H^{-2}Lyy^TLH^{-1}+H^{-1}Lyy^TLH^{-2}\Big) \\
}$$
NB: In the above, a colon is used as a convenient notation for the trace operation, i.e.
$$A:B = {\rm Tr}(A^TB)$$
The cyclic property of the trace allows terms in such a product to be rearranged in a number of ways, e.g.
$$\eqalign{A:BC &= AC^T:B \\&= B^TA:C \\&= BC:A \\&= etc}$$
The diag() function extracts the main diagonal of its matrix argument and returns it as a column vector, while the Diag() function takes a vector argument and returns a diagonal matrix.
Update
Since you asked about it, here is how the third-order gradient can be calculated.
Start by introducing a third-order tensor ${\cal F}$ and a fourth-order tensor ${\cal E}$ whose components can be written as
$$\eqalign{
{\cal F}_{ijk}
&= \begin{cases}
1 \quad&{\rm if\;} i=j=k \\
0 \quad&{\rm otherwise} \\
\end{cases} \\
{\cal E}_{ijkl}
&= \begin{cases}
1 \quad&{\rm if\;} i=k {\rm\;and\,} j=l \\
0 \quad&{\rm otherwise} \\
\end{cases} \\
}$$
These tensors are useful because of the following properties
$$\eqalign{
{\rm Diag}(w) &= {\cal F}\cdot w \\
{\rm diag}(A) &= {\cal F}:A \\
ABC &= \big(A\cdot{\cal E}\cdot C^T\big):B \\
}$$
Applying this to the above differential formula yields
$$\eqalign{
dV^2
&= -(V\,dH\,V^2+V^2dH\,V) \\
&= -(V\cdot{\cal E}\cdot V^2+V^2\cdot{\cal E}\cdot V):dH \\
dH^{-2}
&= -(V\cdot{\cal E}\cdot V^2+V^2\cdot{\cal E}\cdot V):{\cal F}\cdot dw \\
\frac{\partial H^{-2}}{\partial w}
&= -(V\cdot{\cal E}\cdot V^2+V^2\cdot{\cal E}\cdot V):{\cal F} \\
}$$
where the various dot products with tensors are defined in index notation as
$$\eqalign{
{\cal P} &= {\cal B}:{\cal C}
\quad&\implies
{\cal P}_{ijmn} &= \sum_k\sum_l{\cal B}_{ijkl}\,{\cal C}_{klmn} \\
{\cal Q} &= {\cal B}\cdot{\cal C}
&\implies
{\cal Q}_{ijkmnp} &= \sum_l{\cal B}_{ijkl}\,{\cal C}_{lmnp} \\
}$$
Having derived an expression for a typical higher-order tensor gradient, I hope you understand why you will never need it. The only reason anyone asks for it, is because they want to use it in a misguided attempt to apply the chain rule.
But instead of the chain rule, one should approach these problems using differentials.
Another workable approach is to use vectorization (aka column-stacking) to reshape every matrix into a (long) column vector.
For typing convenience, define the matrices
$$\eqalign{
Y &= XW \\
J &= 1_{n\times K} \qquad&({\rm all\,ones\,matrix}) \\
S &= {\rm sign}(Y) \\
A &= S\odot Y \qquad&({\rm absolute\,value\,of\,}Y) \\
B &= A-J \\
Y &= S\odot A \qquad&({\rm sign\,property}) \\
}$$
where $\odot$ denotes the elementwise/Hadamard product and the sign function is applied element-wise. Use these new variables to rewrite the function, then calculate its gradient.
$$\eqalign{
\phi &= \|B\|_F^2 \\&= B:B \\
d\phi &= 2B:dB \\
&= 2(A-J):dA \\
&= 2(A-J):S\odot dY \\
&= 2S\odot(A-J):dY \\
&= 2(Y-S):dY \\
&= 2(Y-S):X\,dW \\
&= 2X^T(Y-S):dW \\
\frac{\partial\phi}{\partial W} &= 2X^T(Y-S) \\
}$$
where a colon denotes the trace/Frobenius product, i.e.
$$\eqalign{
A:B
= {\rm Tr}(A^TB)
= {\rm Tr}(AB^T)
= B:A
}$$
The cyclic property of the trace allows such products to be rearranged in various ways
$$\eqalign{
A:BC &= B^TA:C \\
&= AC^T:B \\
}$$
Finally, when $(A,B,C)$ are all the same size, their Hadamard and Frobenius products commute with each other
$$\eqalign{
A:B\odot C &= A\odot B:C \\\\
}$$
NB: When an element of $\,Y$ equals zero, the gradient is undefined. This behavior is similar to the derivative of $\,|x|\,$ in the scalar case.
Best Answer
The hat matrix of $A$ is defined as $$H = A(A^TA)^{-1}A^T$$ This matrix is an orthoprojector, since $\,H^2=H=H^T$
The matrix $P=(I-H)$ is also an orthoprojector,
however, $\;Z=(H-I)$ is not an orthoprojector since $Z^2 = -Z \ne Z$.
The objective function can be written using the orthoprojector
$$\eqalign{ f &= b^T(-P)^T(-P)b \cr&= b^TPb\cr&=bb^T:P\cr&=bb^T:(I-H) }$$ where a colon denotes the trace/Frobenius product, i.e. $\;A:B = {\rm Tr}(A^TB)$
Calculate the differential and gradient of the function.
${\tt [}\,$For convenience, define $B=A^TA.{\tt ]}$ $$\eqalign{ df &= bb^T:(-dH) \cr &= -bb^T:d(AB^{-1}A^T) \cr &= -bb^T:(dA\,B^{-1}A^T+A\,dB^{-1}A^T+AB^{-1}dA^T) \cr &= -bb^T:(2\,dA\,B^{-1}A^T+A\,dB^{-1}A^T) \cr &= -bb^T:(2\,dA\,B^{-1}A^T-AB^{-1}\,dB\,B^{-1}A^T) \cr &= bb^T:AB^{-1}\,dB\,B^{-1}A^T \,-\, 2bb^T:dA\,B^{-1}A^T \cr &= B^{-1}A^Tbb^TAB^{-1}:dB \,-\, 2bb^TAB^{-1}:dA \cr &= B^{-1}A^Tbb^TAB^{-1}:(dA^TA+A^TdA) \,-\, 2bb^TAB^{-1}:dA \cr &= 2AB^{-1}A^Tbb^TAB^{-1}:dA \,-\, 2bb^TAB^{-1}:dA \cr &= 2\big(Hbb^TAB^{-1} - bb^TAB^{-1}\big):dA \cr &= 2\big(H-I\big)\,bb^TA(A^TA)^{-1} : dA \cr \frac{\partial f}{\partial A} &= 2\big(H-I\big)\,bb^TA(A^TA)^{-1} \cr &= 2\,Zbb^TA(A^TA)^{-1} \cr\cr }$$ NB: The cyclic property of the trace allows a Frobenius product to be rearranged in many different ways. For example $$\eqalign{ &A:BC \;&=\; AC^T:B \;=\; B^TA:C \;=\; I:A^TBC \cr &A:X^T \;&=\; A^T:X \cr &A:B \;&=\; B:A \cr }$$