Gradient of joint moment generating function of multivariate normal distribution

moment-generating-functionsmultivariable-calculusnormal distributionprobability distributionsvector analysis

I'm trying to find the gradient of the joint MGF of the multivariate normal distribution, $M_{\pmb{X}}(\pmb{t}) = \exp\left[\pmb{t}^\mathsf{T} \pmb{\mu} + \frac{1}{2}\pmb{t}^\mathsf{T}\Sigma \pmb{t}\right]$, and am having trouble finding clues online.

I understand how to derive the MGF and that the gradient should be

$\nabla M_{\pmb{X}}(\pmb{t}) = \begin{bmatrix}\frac{\delta M_{\pmb{X}}(\pmb{t})}{\delta t_1}\\ \vdots \\ \frac{\delta M_{\pmb{X}}(\pmb{t})}{\delta t_n} \end{bmatrix}$

but can't figure out how to express the partial derivatives in terms of $\mu$, $\Sigma$, and $t$. Differentiating $\exp\left[\pmb{t}^\mathsf{T} \pmb{\mu} + \frac{1}{2}\pmb{t}^\mathsf{T}\Sigma \pmb{t}\right]$ with respect to each $t$ seems to lead to some really messy algebra.

Since $\pmb{t}^\mathsf{T} \pmb{\mu}$, and $\pmb{t}^\mathsf{T}\Sigma \pmb{t}$ are scalar is it possible to write $M_{\pmb{X}}(\pmb{t}) = \exp\left[\pmb{t}^\mathsf{T} \pmb{\mu}\right] \cdot \exp\left[\frac{1}{2}\pmb{t}^\mathsf{T}\Sigma \pmb{t}\right]$ and use the product rule for two scalar valued $n$-variable functions? I.e.

$\nabla M_{\pmb{X}}(\pmb{t}) = \nabla [f(\pmb{t}) \cdot g(\pmb{t})] = f(\pmb{t}) \cdot \nabla g(\pmb{t}) + g(\pmb{t}) \cdot \nabla f(\pmb{t})$

where $f(\pmb{t}) = \exp\left[\pmb{t}^\mathsf{T} \pmb{\mu}\right]$ and $g(\pmb{t}) = \exp\left[\frac{1}{2}\pmb{t}^\mathsf{T}\Sigma \pmb{t}\right]$.

However this also leads to some mess with $\nabla g$ and the covariance matrix $\Sigma$. Though I can keep trying down this road if this is the best approach.

Any help/hints would be greatly appreciated, thanks!

Best Answer

If you want to avoid vector/matrix stuff, just write $$M_X(t) = \exp\left(\sum_i t_i \mu_i + \frac{1}{2}\sum_i \sum_j\Sigma_{ij} t_i t_j\right)$$ and find the partial derivatives using basic calculus. Doing this can actually give you insight into the "slick" ways of doing this using vector/matrix differentiation rules.


If you want to use shortcuts:

By the chain rule, $$\nabla_t M_X(t) = e^{t^\top \mu + \frac{1}{2} t^\top \Sigma t} \nabla_t (t^\top \mu + \frac{1}{2} t^\top \Sigma t) = e^{t^\top \mu + \frac{1}{2} t^\top \Sigma t} [\nabla_t (t^\top \mu) + \frac{1}{2} \nabla_t(t^\top \Sigma t)].$$

The first term is $\nabla_t(t^\top \mu) = \mu$. (If this isn't obvious, go through the approach at the beginning of my post, by working with $t^\top \mu = \sum_i t_i \mu_i$.)

The second term is $\frac{1}{2} \nabla_t(t^\top \Sigma t) = \Sigma t$. (This is the analogue of $\frac{d}{dx} \frac{1}{2} ax^2 = ax$. To prove the general case, it can be helpful to just go through the basic approach at the beginning of my post, by working with $t^\top \Sigma t = \sum_i \sum_j \Sigma_{ij} t_i t_j$ directly.)

Related Question