The Hessian of a Radial Basis Function

hessian-matrixmatrix-calculusmultivariable-calculuspartial derivative

In this paper the Hessian of a RBF is shown as

$$
H_j = \frac{\beta_j}{r_j(x)} \bigg(
\phi'(r_j) I +
\bigg[
\phi''(r_j) – \frac{\phi'(r_j)}{r_j(x)}
\bigg]
\bigg) (x-x_j) \frac{\partial r_j}{\partial x}
$$

where

$$
r_j = || x-x_j|| = \sqrt{(x-x_j) ^T (x-x_j)}
$$

and $\phi(r)$ could be one of the many RBF functions, such as $\phi(r) = \exp(-cr^2)$.

An RBF is of course

$$
f(x) = \beta^T g(x) = \sum_i \beta_i \phi(r_i)
$$

I understand how the Jacobian is obtained, but was not able to derive the Hessian. Where did the identity matrix come from? Why $H_j$ and not $H_{ij}$?

Any help would be appreciated.

Best Answer

Define the radial vectors $\,w_k = (x - x_k)\in{\mathbb R}^n\,$ and note their differentials are identical $\,dw_k = dx$.

Concatenate the vectors into a single matrix $$\eqalign{ \Omega&=\big[\,w_1\;w_2\;\ldots\;w_m\big]&\in&{\mathbb R}^{n\times m} \\ d\Omega&=\big[\,dx\;dx\;\ldots\;dx\big] &= &dx\,{\tt\large 1}^T }$$ Note that the $\,r_j=\|w_j\|\,$ are themselves elements of a vector $\,r\in{\mathbb R}^m$.
The cartesian base vectors $\,e_k\in{\mathbb R}^m$ allow us to write all of the above as $$w_k=\Omega\,e_k,\quad dx=d\Omega\,e_k,\quad r_j=e_j^Tr$$ Applying the RBF function element-wise avoids summations and subscripts. Simply write the derivatives and differentials as $$\eqalign{ &g=\phi(r),\quad g'=\phi'(r),\quad g''=\phi''(r)\;&\in{\mathbb R}^m \\ &dg=g'\odot dr,\quad dg'=g''\odot dr \;&\in{\mathbb R}^m \\ }$$ where $\odot$ is the elementwise/Hadamard product.

It will also be useful to switch between vectors and diagonal matrices, which will be indicated by an uppercase letter, e.g. $$\eqalign{ &R={\rm Diag}(r),\quad &G={\rm Diag}(g),\quad &G''={\rm Diag}(g'')\;\in{\mathbb R}^{m\times m} \\ &r = {\rm diag}(R),\quad &g = {\rm diag}(G),\quad &g''=\ldots \\ &r = R{\tt\large 1},\quad &g = G{\tt\large 1},\quad &g''=\ldots \\ &dg = G'dr,\quad &dg' = G''dr \\ }$$ and it will prove convenient to define $$ P=R^{-1}\quad\implies PR=I,\;\;p\odot r = {\tt\large 1} $$

Now write down the main relationship and differentiate it. $$\eqalign{ r\odot r &= {\rm diag}(\Omega^T\Omega) \\ 2r\odot dr &= {\rm diag}(\Omega^Td\Omega+d\Omega^T\Omega) \;=\; 2{\,\rm {diag}}(\Omega^Td\Omega) \\ R\,dr &= {\rm diag}(\Omega^Tdx\,{\tt\large 1}^T) \;=\; \Omega^Tdx \\ dr &= P\Omega^Tdx \\ \frac{\partial r}{\partial x} &= P\Omega^T \\ }$$ A quick check of the $i^{th}$ component recovers Eqn (18) from the paper and confirms that we're on the right path: $$\eqalign{ e_i^T\bigg(&\frac{\partial r}{\partial x}\bigg) = e_i^TP\Omega^T \\ &\frac{\partial r_i}{\partial x} \;=\; \frac{1}{r_i}\;e_i^T\Omega^T \;=\; \frac{w_i^T}{\|w_i\|} \\ }$$ The model function (replacing $\beta$ with the easier to type $b$) is $$f = b^Tg = b:g$$ where the colon is the Frobenius product notation for the trace, i.e. $\;A:B = {\rm Tr}(A^TB)$

Calculate the differential and gradient of the model. $$\eqalign{ df &= b:dg \;=\; b:g'\odot dr \;=\; g'\odot b:dr \\ &= (g'\odot b):P\,\Omega^Tdx \\ &= \Omega P(g'\odot b):dx \\ &= \Omega PG'B{\tt\large 1}:dx \\ J=\frac{\partial f}{\partial x} &= \Omega PG'B{\tt\large 1} \;=\; \bigg(\frac{\partial r}{\partial x}\bigg)^T\Big(b\odot g'\Big) \\ }$$ All that hard work produced the Jacobian (gradient). Now on to the Hessian. $$\eqalign{ dJ &= d\Omega\,PG'B{\tt\large 1} + \Omega PdG'B{\tt\large 1} + \Omega\,dP\,G'B{\tt\large 1} \\ &= dx\,{\tt\large 1}^TPG'B{\tt\large 1} + \Omega PB\,dg' - \Omega (P\,dR\,P)G'B{\tt\large 1} \\ &= dx\,({\tt\large 1}^TPG'B{\tt\large 1}) +\Omega PB\,dg' -\Omega PG'PB\,dr \\ &= (G':PB)\,dx +\Omega PBG''\,dr -\Omega PG'PB\,dr \\ &= \Big((G':PB)I +\Omega PB(G'' - PG')P\Omega^T\Big)\,dx \\ H = \frac{\partial J}{\partial x} &= (G':PB)I + \Omega PB(G''-PG')P\Omega^T \\ &= \Big((p\odot b):g'\Big)\,I \;+\; \bigg(\frac{\partial r}{\partial x}\bigg)^T\Big(BG''-BPG'\Big)\bigg(\frac{\partial r}{\partial x}\bigg) \\ }$$ Although it may not look like it, this result is identical to Eqn (21). The above form has the advantage that it is obviously symmetric (as it must be) whereas the result in the paper is awkward.

When manipulating the above expressions, keep in mind that the $(R,G,B)$ matrices are all diagonal and therefore commute with one another, however $\,\Omega\,$ is a full matrix and does not commute with the others.

The clumsy notation $H_j$ does not refer to a vector or matrix element, but rather to the fact that $H(x)$ was evaluated at the point $x=x_j$. All of the variables above, except for $(b,1,e_k,B,I)$, are implicitly functions of $x$. So you could slap a $j$ subscript on each of them to calculate the expression for $H_j$.