Solved – Correct equation for Breslow-Day statistic in homogeneity test of odds ratio

chi-squared-testheteroscedasticityodds-ratio

In Statistical Methods of Cancer Research; Volume 1 – The analysis of case-control studies the authors Breslow and Day derive a statistic to test for the homogeneity of combining strata into an odds ratio (equation 4.30). Given the value of the statistic, the test determines if it is appropriate to combine strata together and compute a single odds ratio.

For example, if we have only one 2×2 contingency table:


(source: kean.edu)

the odds ratio for getting a disease with a risk factor compared to not having the risk factor is:

$$
\psi = (A*D)/(B*C)
$$

if we have multiple contingency tables (for example, we stratify by age group), we can use the Mantel-Haenzel estimate to compute the odds ratio across all $I$ strata:

$$
\psi_{mh} = \frac{\sum_{i=1}^{I}A_i D_i / N_i}{\sum_{i=1}^{I}B_i C_i / N_i}
$$

For each contingency table we have $R1=A+B$, $R2=C+D$ and $C1=A+C$ so we can express the expected odds ratio for that table in terms of the totals:

$$
\psi_{mh} = \frac{A D}{B C} = \frac{A (R2-C1+A)}{(R1-A)(C1-A)}
$$

which gives a quadratic equation for A. Let $a$ be the solution to this quadratic equation (only one root gives a reasonable answer).

Thus a reasonable test for the adequacy of the assumption of a common odds ratio is to sum up the squared deviation; of observed and fitted values, each standardized by its variance:

$$
\chi^2 = \sum_{i=1}^{I}\frac{(a_i – A_i)^{2}}{V_i}
$$

where the variance is:

$$
V_i = \left( \frac{1}{A_i} + \frac{1}{B_i} + \frac{1}{C_i} + \frac{1}{D_i} \right)^{-1}
$$

If the homogeneity assumption is valid, and the size of the sample is large relative to the number of strata, this statistic follows an approximate chi-square distribution on $I-1$ degrees of freedom and thus a p-value can be determined.

If instead we divide the $I$ strata into $H$ groups and we suspect the odds ratios are homogeneous within groups but not between them, Breslow and Day give an alternative statistic (equation 4.32):

$$
\chi^2 = \sum_{h=1}^{H}\frac{\left( \sum_i a_i – A_i \right)^{2}}{\sum_i V_i}
$$

where the $i$ summations are over strata in the $h^{th}$ group with the statistic being chi-square with only $H-1$ degrees of freedom (I assume a different Mantel-Haenzel estimate is computed within each group).

My question is equation 4.32 does not seem right to me. If anything, I'd expect it to be of the form:

$$
\chi^2 = \sum_{h=1}^{H}\frac{ \sum_i \left(a_i – A_i\right)^{2} }{\sum_i V_i}
$$

or:

$$
\chi^2 = \sum_{h=1}^{H}\sum_{i}\frac{(a_i – A_i)^{2}}{V_i}
$$

with the latter equation approximating a chi-square distribution on $I-1$ degrees of freedom.

Which of these equations should I be using?

Best Answer

This is more directly and more accurately handled through the use of a binary logistic regression model with an interaction term. The usually-best test is the likelihood ratio $\chi^2$ test from such a model. The regression context also allows one to test continuous variables, adjust for other variables, and a host of other extensions.

General comment: I think we spend too much time teaching special cases and would do well to use general tools so that we have more time to deal with complications such as missing data, high dimensionality, etc.

Related Question