Solved – How to estimate a standardized mean difference from two samples’ quartiles

effect-sizeestimationmeta-analysisnormality-assumptionquantiles

I'd appreciate ideas about how to estimate the standardized mean difference (SMD) for two independent samples from each sample's size and middle three quartiles (i.e., median and 25th and 75th percentiles). This problem's context is a meta-analysis whose focal effect size is a post-intervention SMD between treatment and control groups on a continuous outcome variable: Many studies report both samples' sizes, means, and variances or equivalent results (e.g., from a t test), from which a SMD is readily estimated, but some studies report other results instead (e.g., sizes and quartiles). Below I state the problem more precisely, give an example, and offer a few remarks.

Suppose we're interested in the following SMD on a continuous dependent variable $Y$ for Groups 1 and 2: $\delta = (\mu_1 – \mu_2) / \sigma$, where $\mu_j = \mathrm{E}(Y_j)$ and $\sigma^2 = \mathrm{Var}(Y_j)$ for $j = 1, 2$ are the groups' means and (common) variance. For convenience let's assume normality for each group, so $Y_j \sim \mathcal{N}(\mu_j, \sigma^2)$. Now consider a study that reports for one simple random sample from each group the size, $n_j$, and estimates of the three middle quartiles, $q_{1j}$, $q_{2j}$, and $q_{3j}$. How can we estimate $\delta$ from these results?

EXAMPLE: The following randomized study is from a meta-analysis of interventions to improve medication adherence. In short, Group 1's treatment sample experienced a multi-component intervention, Group 2's control sample received no attention beyond enrollment and data collection, and the outcome variable was a pharmacy refill measure on which higher scores represent better adherence. The samples' sizes and quartiles are as follows:

  • Group 1: $n_1$ = 80, $q_{11}$ = 58, $q_{21}$ = 85, $q_{31}$ = 174

  • Group 2: $n_2$ = 46, $q_{12}$ = 31, $q_{22}$ = 79, $q_{32}$ = 158

The intervention seems to have improved adherence, in that all three quartiles are higher for the treatment group. Both groups also exhibit evidence of positive skewness. Answers to this question could use these results to demonstrate any proposed method for estimating $\delta$.

REMARKS: Below are several related issues that may be worth considering, in no particular order.

R1. Typically no details are reported about how exactly quartiles were computed. In principle one could obtain this information — and perhaps the original subject-level data — from the study's author(s), but in practice that's often not feasible.

R2. For now I'd be content with a "reasonable" estimator of $\delta$. I'm less interested in properties of the estimator (e.g., bias, consistency, efficiency) or in finding an optimal estimator (e.g., UMVUE), but those would be interesting to consider eventually.

R3. A potential complication is duplicate quartiles within a sample (e.g., $q_{11} = q_{21}$) or between samples (e.g., $q_{21} = q_{22}$), especially when reported results are rounded severely. We can ignore this for now, but it arises in practice. For instance, another study from the same meta-analysis as the above example reported all six quartiles as 3.

R4. One ad hoc method I've considered involves obtaining several estimates of $\delta$ and combining them in a way that accounts for their variances and correlations (e.g., generalized least-squares using estimated precision matrix). For instance, we could use Group 1's results to estimate $\sigma$ and $\mu_1$ as follows, where $\Phi^{-1}(p)$ is the standard normal inverse CDF (i.e., quantile function) at $p$:

$\hat\sigma = (q_{21} – q_{11}) / [\Phi^{-1}(1/2) – \Phi^{-1}(1/4)]$ .

$\hat\mu_1 = q_{11} – \Phi^{-1}(1/4)\hat\sigma$ .

We could use other pairs of quartiles for these estimates, do the same for Group 2, and estimate $\delta$ using various combinations of estimates from Groups 1 and 2 (e.g., mean difference and pooled variance). Note that $\delta$ estimates from different combinations of quartiles may differ in sign.

R5. Some meta-analytic techniques require a sampling variance for the estimate of $\delta$, such as precision-weighted methods that use weighted least-squares. For now I'm willing to obtain this variance by resampling or simulation, but a closed-form expression would be convenient.

R6. I can imagine numerous useful variants on or extensions of this problem. For example, suppose a study reports for at least one group different or more quartiles (e.g., $q_{0j}$ [minimum] or $q_{4j}$ [maximum]), different or more quantiles (e.g., 10th and 90th percentiles), or other results (e.g., sample mean). Another idea is to consider monotonic transformation of the quartiles, say $X_j = f(Y_j)$, so that $X_1$ and $X_2$ better satisfy normality and homoscedasticity.

Best Answer

If you are willing to assume that $Y$ has a symmetric distribution within the two groups, then the medians of the two groups (i.e., $q_{21}$ and $q_{22}$) could be used in place of the means. Furthermore, if you are willing to assume that $Y$ is normally distributed within the two groups, then you could make use of the relationship between the IQR and the SD for the normal distribution, namely, $SD \approx IQR / 1.35$. So, you can compute the two IQRs with $IQR_1 = q_{31} - q_{11}$ and $IQR_2 = q_{32} - q_{12}$, transform them to SDs, pool those two SDs in the usual manner, and then you have all of the pieces to compute the standardized mean difference.

Example: For your example data, this would be $$IQR_1 = 174 - 58 = 116$$ $$IQR_2 = 158 - 31 = 127,$$ so $$SD_1 = 116 / 1.35 = 85.93$$ $$SD_2 = 127 / 1.35 = 94.07.$$ Therefore, $$SD_p = \sqrt{\frac{(80-1)85.93^2 + (46-1)94.07^2}{80+46-2}} = 88.97.$$ And finally: $$d = \frac{85-79}{88.97} = 0.07$$ Now you could use the usual equation to estimate the sampling variance of $d$ (Hedges & Olkin, 1985): $$v = \frac{1}{80} + \frac{1}{46} + \frac{0.07^2}{2(80+46)} = 0.034.$$

Remarks: Under normality, $d$ should be an okay estimator of the true SMD. However, the use of medians in place of means and the estimation of the SDs via the IQRs involves a loss of precision. The usual equation for the sampling variance of $d$ does not reflect that, so it yields values that are probably too small (on average).

Also, the appropriateness of this method hinges on the symmetry/normality assumption. Unfortunately, authors typically choose to report medians and IQRs whenever they suspect that $Y$ has a non-normal/symmetric distribution. So, I would regard this method only as a rough approximation.

References:

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando: Academic Press.

Related Question