Solved – Difference between Wald test and Chi-squared test

chi-squared-testmaximum likelihoodwald test

I am aware that both tests assume Chi-square distribution of data. Here are what I know:

Wald test as multi-variable generalization of student's t-test tests the statistical difference of mean between groups. Chi-squared test on the other hand tests the statistical difference of frequency between groups .

Their calculations are similar with difference of denominator:variance (Wald) vs mean (Chi-square).

Wald test can be used in Cox model to test if any variables is significantly different. Chi-square test is used to test the proportionality of contingency table.

I realize that these two tests are very similar in their assumption, calculation and usage. However, this is just my rough impression and I could not get an intuitive understanding of their relationship and difference. Can anyone share some comments on this?

Best Answer

The relationship between the Wald test and the Pearson $\chi^2$ is a particular example of the relationship between Wald tests and score tests.

The Wald test statistic for the difference between a value of a parameter $\hat \theta$ estimated from a data sample and a null-hypothesis value $\theta_0$ is:

$$W = \frac{ ( \widehat{ \theta}-\theta_0 )^2 }{\operatorname{var}(\hat \theta )}$$

The Pearson $\chi^2$ test statistic for differences in a contingency table between a set of observed counts ($O_i$) and those expected based on a null hypotheses like independence, $E_i$, is:

$$\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$$

where the sum is over all cells $i$ of the contingency table. In both Wald and Pearson tests, the numerator terms represent squared differences between values found from the data sample and those expected under the null hypothesis.

The relationship you intuit between the Pearson statistic and the Wald statistic becomes clear if you think about a contingency table as representing a sample from a set of Poisson-distributed count variables, one for each cell. (That's the basis of Poisson regression, also called log-linear modeling, for contingency-table analysis.) For a Poisson distribution the variance equals the mean, so the denominator terms can be considered the estimated variances of counts in each cell of the table under the null hypothesis.

In contrast, the denominator in the single-parameter Wald statistic is the variance around the estimated value of the parameter. For Wald tests of more complicated hypotheses the denominator is related to the covariance matrix of the parameter estimates, still evaluated around the estimated parameter values.

So in both tests the denominator involves a variance estimate, one evaluated at the null hypothesis and the other around the estimated parameter values. This is as explained by @gung in this answer about likelihood-ratio, Wald, and score tests, as the Pearson test is a particular example of a score test.* Wald tests are evaluated at the parameter values estimated by maximum likelihood while score tests are evaluated at the null hypothesis.

In practice, statistical software will report all 3 tests for a full multiple-regression model fit by maximum likelihood but usually only Wald tests for individual coefficients. That's mostly because the parameter covariance estimates for the Wald test can be obtained directly from the numerical approximations made in fitting the model, while there has to be re-evaluation of the model with respect to each parameter to get p-values and confidence intervals based on the other tests. That doesn't mean that Wald tests are "better" in any sense other than convenience. With small samples Wald tests are often considered the least reliable.


*This paper by Gordon Smyth provides a proof of how a more general Pearson goodness-of-fit test is equivalent to a score test. Score tests are sometimes called Lagrange-multiplier tests.