Solved – Asymptotic Distribution of the Wald Test Statistic

asymptoticsdistributionslikelihoodmathematical-statisticswald test

I am trying to understand the asymptotic distribution of the Wald test statistic, specifically under the alternative hypothesis which I've found little reference to.

For clarity, the binary hypothesis for an unknown parameter vector $\theta$ of size $(r \times 1)$ is:

\begin{equation}
\text{H}_0 : \theta = \theta_0 \nonumber \\
\text{H}_1 : \theta \neq \theta_0
\end{equation}

and the resulting log likelihood ratio, when assuming the asymptotic PDF of the MLE is attained ($N \rightarrow \infty$), can be modified to yield the Wald Test Statistic:

\begin{equation}
2 \ln L_G({\bf x} ) = T_W({\bf x}) = (\hat{\theta}_1 – \theta_0)^T {\bf I}(\hat{\theta}_1) (\hat{\theta}_1 – \theta_0)
\end{equation}

where $\hat{\theta}_1$ denotes the unrestricted MLE and ${\bf I}$ is the FIM. Now, as we seek to attain the asymptotic distribution of this test statistic we can note:

$
\hat{\theta}_1 \sim
\begin{cases}
\hfill \mathcal{N}\Big( \theta_0, {\bf I}^{-1}(\theta_0) \Big) \hfill & \text{under } \text{H}_0 \\
\hfill \mathcal{N}\Big( \theta_1, {\bf I}^{-1}(\theta_1) \Big) \hfill & \text{under } \text{H}_1\\
\end{cases}
$

So that as $N \rightarrow \infty$, under the null hypothesis ($\text{H}_0$) :

\begin{equation}
T_W({\bf x}) = (\hat{\theta}_1 – \theta_0)^T {\bf I}(\theta_0) (\hat{\theta}_1 – \theta_0) \sim \chi^2_r
\end{equation}

But it is the distribution under $\text{H}_1$ I get confused. A variety of texts/proofs (i.e. Kay Vol II) say the following, as $N \rightarrow \infty$:

\begin{equation}
{\bf I}(\hat{\theta}_1) (\hat{\theta}_1 – \theta_0) = {\bf I}(\theta_0) (\hat{\theta}_1 – \theta_0)
\end{equation}

The above equation I have no clue how to validate. Anyways, the proof continues so that under $\text{H}_1$:

\begin{equation}
T_W({\bf x}) = (\hat{\theta}_1 – \theta_0)^T {\bf I}(\theta_1) (\hat{\theta}_1 – \theta_0) \sim \chi^2_r (\lambda)
\end{equation}
where the non centrality parameter is equal to:
\begin{equation}
\lambda = ( \theta_1 – \theta_0 )^T{\bf I}(\theta_1) ( \theta_1 – \theta_0 )
\end{equation}

or equivalently, as $N \rightarrow \infty$:
\begin{equation}
\lambda = ( \theta_1 – \theta_0 )^T{\bf I}(\theta_0) ( \theta_1 – \theta_0 )
\end{equation}

How can this possibly be true? Unless there's a strong assumption taken here that I am unaware of, there is no reason, even asymptotically, the variances of these two values ($\theta_1$ and $\theta_0$) should be equal. What am I missing? Something inherent in the Wald Test Statistic?

Any input on this would be appreciated as I'm completely vexed. Thanks!

Best Answer

I believe I found the answer. What I suspected is actually true.

\begin{equation} {\bf I}(\theta_1) \approx {\bf I}(\theta_0) \end{equation}

Because it is assumed $\theta_1$ is near, or "contingent" to $\theta_0$. And as $N \rightarrow \infty$ we are guaranteed the convergence of the MLE to these true values. As such, this assumption is "validated." It is highly limiting, but makes sense why the distribution under the alternative hypothesis is rarely mentioned in literature.

This also makes sense since the Wald, Rao, and likelihood tests are asymptotically equal up to a 1st order. From 2nd order and up divergences happen.

Related Question