Hypothesis Testing – Approximate Distribution of Test Statistic for Weighted Sample Mean

approximationgamma distributionhypothesis testingt-testweighted-variance

Let
$$ R_{i}(t) \sim \mathcal{N}(\mu_i, \sigma_i^2), $$
denote the one period return distribution for asset $i$, from which we observe the iid samples $\{R_i(t)\}_{t=1}^{n_i}$. The MLE sample mean and unbiased sample variance are given by $\hat{\mu_i}:=\frac{1}{n_i}\sum_{t=1}^{n_i}R_i(t)$ and $\hat{\sigma}_i^2 := \tfrac{1}{n_i-1}\sum_{t=1}^{n_i}(R_i(t)-\hat{\mu}_i)^2$, respectively.

The total return for asset $i$ is then
$$X_i:= \sum_{t=1}^{n_i} R_i(t) \sim\mathcal{N}(n_i\mu_i, n_i\sigma_i^2).$$

Now let $w_i$ be the weight associated with return $X_i$ such that the weighted total return of $m$ independent assets is
\begin{align}
W:=\sum_{i=1}^{m} w_iX_i \sim \mathcal{N}\left(\sum_{i=1}^m n_i w_i \mu_i, \sum_{i=1}^m n_i w_i^2 \sigma_i^2\right) =: \mathcal{N}(\mu_W, \sigma_W^2).
\end{align}

We now want to test the following hypotheses
\begin{align}
\mathcal{H}_0&: \mu_W=0, \\
\mathcal{H}_1&: \mu_W<0.
\end{align}

Under $\mathcal{H}_0$ the test statistic is
\begin{align}
T:= \frac{W-0}{\hat{\sigma}_W},
\end{align}

and we reject $\mathcal{H}_0$ if $T<F_T^{-1}(\alpha)$, for some significance level $\alpha$, where $F_T^{-1}$ is the inverse CDF of $T$. Hence, we need to know the (inverse) CDF of $T$, or more specifically, the (inverse) left tail CDF of $T$.

The distribution of $T$

Since $(n_i-1)\hat{\sigma}_i^2/\sigma_i^2 \sim \mathcal{X}_{n_i-1}^2$ and with $c\mathcal{X}_{k}^2 \overset{d}{=} \Gamma_{\alpha\beta}\left(\frac{k}{2}, \frac{1}{2}/c \right)$,
we have that

\begin{align}
\hat{\sigma}_W^2 := \sum_{i=1}^m n_i w_i^2 \hat{\sigma}_i^2 = \sum_{i=1}^m n_iw_i^2 \frac{\sigma_i^2}{n_i-1} \mathcal{X}_{n_i-1}^2 = \sum_{i=1}^m \Gamma_{\alpha\beta}\left(\frac{n_i-1}{2}, \frac{n_i-1}{2} \frac{1}{n_iw_i^2\sigma_i^2}\right),
\end{align}

which means that $\hat{\sigma}_W^2$ is a weighted sum of scaled Chi-squared random variables, or equivalently, a sum of Gamma distributions, which does not have a known closed form distribution. However, approximations can be made, or it is matched to a gamma or chi2 distribution.

Simulation with $1 < n_i < 30$, $1<m<30$, $w_i^2\sigma_i^2\ll 1 \implies \alpha\gg \beta$, I observe that $T$ is indeed well approxiamted by a student-t distribution, albeit with degrees of freedom that seems to depend on $\{n_i, m\}$.

Questions:

  1. If we assume $T$ is a, possibly scaled, student-t distribution, then what is the degrees of freedom and possibly scaling?
  2. Should we moment match $\hat{\sigma}_W^2$ to a Chi-squared distribution, or $\hat{\sigma}_W$ to a Chi distribution? There is some convexity differences between them two.
  3. Please share if you know of a better approximation the left tail rejection value $F_T^{-1}(\alpha)$?

Best Answer

For moderate tail probabilities, the Satterthwaite approximation is better than it has any right to be. It's anticonservative for small tail probabilities.

There are two approximations, due to Davies and to Farebrother, that would be arbitrarily accurate using enough terms and with infinite-precision arithmetic. In practice, they are very good if the tail probability is much larger than machine epsilon and the number of terms $m$ not being too larger (hundreds). These are available, for example, in the R package CompQuadForm

At extreme tail probabilities, the saddlepoint approximation (described at your link) is better: the relative error is bounded in $q$ and decreases with increasing $m$. This is in the R survey package, in the pchisqsum function.

For large $m$, it's better to do an explicit convolution of the few largest terms and a Satterthwaite approximation to the remainder. More detail here

Related Question