I think this book is suitable for further study about this question:
"Representation Theory of Finite Groups An Introductory Approach" "Benjamin Steinberg"
This book has some chapters related to your question.
The derivative can be found via implicit differentiation. That is,
$$
\frac{\mathrm{d}\operatorname{vec}\left(Y\right)}{\mathrm{d}\operatorname{vec}\left(X\right)} = \left(\frac{\mathrm{d} \operatorname{vec}\left(X\right)}{\mathrm{d}\operatorname{vec}\left(Y\right)}\right)^{-1}.$$
It is relatively easy to compute the derivative of $A$ with respect to $f(A)$ since $A = f(A)f(A)^{\top}$. The only trick part is restricting $f(A)$ to be lower triangular.
For general $X$, we have
$$
\frac{\mathrm{d} \operatorname{vec}\left(XX^{\top}\right)}{\mathrm{d} \operatorname{vec}\left(X\right)} = \left(I + K\right)\left(X\otimes I\right),$$
where $K$ is the Commutation Matrix.
Now to get the derivative with respect to the $\operatorname{vech}$ requires use of the chain rule. This gives
$$
\frac{\mathrm{d} \operatorname{vech}\left(XX^{\top}\right)}{\mathrm{d} \operatorname{vech}_{\Delta}\left(X\right)} = L \left(I + K\right)\left(X\otimes I\right) D,$$
where here $L$ is the elimination matrix, and $D$ is the "lower triangular duplication matrix" which has the property that $D \operatorname{vech}\left(M\right) = \operatorname{vec}\left(M\right)$ for lower triangular matrices $M$. The sought derivative is the matrix inverse of the above expression.
numerical confirmation:
Here is a numerical confirmation in R: (note that the chol
function in R is an operator from upper triangular matrices to upper triangular matrices, thus some mucking about with transposes):
require(matrixcalc)
set.seed(2349024)
n <- 6
X <- cov(matrix(rnorm(1000*n),ncol=n))
fnc <- function(X) t(chol(X))
Y <- fnc(X)
d0 <- (diag(1,nrow=n^2) + commutation.matrix(r=n)) %*% (Y %x% diag(1,nrow=n))
L <- elimination.matrix(n)
d1 <- L %*% d0 %*% t(L)
dfin <- solve(d1)
# now compute the approximate derivative
apx.d <- matrix(rep(NA,length(dfin)),nrow=dim(dfin)[1])
my.eps <- 1e-6
low.idx <- which(lower.tri(diag(1,n),diag=TRUE))
for (iii in c(1:length(low.idx))) {
Xalt <- X
tweak <- low.idx[iii]
Xalt[tweak] <- Xalt[tweak] + my.eps
# "Note that only the upper triangular part of 'x' is used..."
Yalt <- fnc(t(Xalt))
dY <- (Yalt - Y) / my.eps
apx.d[,iii] <- dY[low.idx]
}
apx.error <- apx.d - dfin
max(abs(apx.error))
apx.error
The maximum absolute error I get is 5.606e-07
, on the order of the delta in the input variable, 1e-06
.
Best Answer
$\newcommand\E{\mathsf E}$You want to find $$E:=\E\frac{X^T BX}{X^T AX},$$ where $X$ is a random vector uniformly distributed on $S^2$, so that $X$ equals $$\frac{[G_1,G_2,G_3]^T}{\sqrt{[G_1,G_2,G_3]^T[G_1,G_2,G_3]}}$$ in distribution, where $G_1,G_2,G_3$ are independent standard normal random variables.
Also, by the spherical symmetry, without loss of generality the matrix $A$ is diagonal. So, for some real $b_{i,j}$'s and some positive real $a_i$'s, $$E=\E\frac{\sum_{j,k=1}^3 b_{j,k}G_jG_k}{\sum_{i=1}^3 a_i G_i^2}=\sum_{j,k=1}^3 b_{j,k}\E R_{j,k},$$ where $$R_{j,k}:=\frac{G_jG_k}{\sum_{i=1}^3 a_i G_i^2}.$$ Note that $R_{j,k}$ will turn into $-R_{j,k}$ if $j\ne k$ and $G_j$ is replaced by $-G_j$. So, if $j\ne k$, then the distribution of $R_{j,k}$ is symmetric and hence $\E R_{j,k}=0$, So, $$E=\sum_{j=1}^3 b_jE_j,$$ where $b_j:=b_{j,j}$ and $$E_j:=\E R_{j,j}=\E\frac{G_j^2}{\sum_{i=1}^3 a_i G_i^2}.$$
So, the problem is equivalent to computing $E_j$. However, it is highly unlikely that $E_j$ can be expressed in terms of elementary or even special functions, even for specific values of the $a_i$'s. For instance, Mathematica cannot do anything for $E_j$ even when $j=1$, $a_1=1$, $a_2=2$, and $a_3=3$:
However, we have this: