Meta-Analysis – Integrating Methods for Reliability Meta-Analysis and Reliability Generalization

meta-analysismetaforreliabilityreliability-generalizationreliability-meta-analysis

I am currently planning a reliability meta-analysis or reliability generalization study (Vacha-Haase, 1998). I write this question assuming a random-effects model has been selected and Cronbach's a is the reliability estimate.

There are several considerations that need to be made, including:

Whether to transform the reliability coefficients (e.g., using Fisher z, Hakstian-Whalen, or Bonett transformations) or the use raw estimates.
Whether to use the inverse variance or sample size as weights.
Which heterogeneity estimator to use (e.g., Hunter-Schmidt, DerSimonian-Laird, REML, etc.).

Traditionally, meta-analysts have used a single method (e.g., the Hunter-Schmidt method involves raw estimates, sample-size weights, and the Hunter-Schmidt estimator of heterogeneity vs. the Hedges-Vevea method, which involves z transformed estimates for correlation coefficients [and Hakstian-Whallen/Bonett transformed estimates for Cronbach's alpha] and inverse variance weights).

This is certainly how some statistical packages operate too (e.g., with the metafor package in R, you cannot request z transformed estimates ("ZCOR") and use the sample-size weighted variance (vtype = "AV"); the packages overrides this selection with the default settings for "ZCOR".

Update: This is not completely accurate – the sampling variance is dependent on the treatment of the distribution, and in this case, it does not make any difference whether the large scale or sample-size weighted variance is used, see Wolfang's answer below. metafor does not work 'within' methods and is as flexible as other packages.

There may be certain instances where meta-analysts wish to combine aspects from different methods because certain statistical procedures suit their dataset (e.g., normality of their coefficient distribution, the range of sample sizes, etc.), they wish to run sensitivity analyses switching certain procedures, or they wish to compare different procedures in simulation studies.

My question is: is it statistically sound to combine statistical procedures from different meta-analytic methods? For example, if one were to transform their reliability coefficients, would they also need to calculate the sampling variance using the assumed procedures (e.g., inverse variance in the case of z transform vs. sample-size weighted variance, for instance)?

Update: Again, the example given here is based on the assumption that the treatment of the coefficient distribution and sampling variance estimation method are two separate things, which is not quite the case. A more appropriate question is whether different statistical procedures can be used for each decision numbered above, e.g., transformed coefficients + sample-size weights (vs. inverse variance)

I am not asking if certain methods or statistical procedures are better or worse than others (a discussion of this can be found here). Instead, I wish to know it if is statistically sound to combine aspects from different methods?

Best Answer

I don't understand how metafor limits you. vtype = "AV" has nothing to do with whether sample-size weights or inverse-variance weights are used when combining multiple raw or r-to-z transformed correlation coefficients. The default in rma() is to use inverse-variance weights, but the weights argument allows you to specify any other weights you like.

The vtype argument allows you to choose, for certain outcome measures, among different ways of calculating the sampling variances. For example, vtype = "AV" can be useful for raw correlation coefficients, since the estimates of the sampling variances can be quite inaccurate for raw correlations. For r-to-z transformed correlations, this is useless, since the r-to-z transformation is a variance-stabilizing transformation for correlation coefficients, so the sampling variances of the transformed values are exactly known.[1]

All this aside, of course you can combine different aspects of different procedures. For example, using escalc(), you can work with raw Cronbach's alpha values, or use the Hakstian-Whalen or the Bonett transformation,[2] you can use any weights you like in rma(), and you can choose among all those heterogeneity estimators you mentioned (and others).

[1] Well, not exactly, since $1/(n-3)$ is still an approximation, but it is very accurate unless $n$ is really small.

[2] escalc() does not give you the option to use the r-to-z transformation for Cronbach's alpha values, since this transformation is meant for Pearson product-moment correlation coefficients, not Cronbach's alpha values.

Related Solutions

Meta-Analysis – Is Precision-Based (Inverse-Variance) Weighting Integral to Meta-Analysis?

The question is difficult to answer, because it is so indicative of a general confusion and muddled state-of-affairs in much of the meta-analytic literature (the OP is not to blame here -- it's the literature and the description of the methods, models, and assumptions that is often a mess).

But to make a long story short: No, if you want to combine a bunch of estimates (that quantify some sort of effect, a degree of association, or some other outcome deemed to be relevant) and it is sensible to combine those numbers, then you could just take their (unweighted) average and that would be perfectly fine. Nothing wrong with that and under the models we typically assume when we conduct a meta-analysis, this even gives you an unbiased estimate (assuming that the estimates themselves are unbiased). So, no, you don't need the sampling variances to combine the estimates.

So why is inverse-variance weighting almost synonymous with actually doing a meta-analysis? This has to do with the general idea that we attach more credibility to large studies (with smaller sampling variances) than smaller studies (with larger sampling variances). In fact, under the assumptions of the usual models, using inverse-variance weighting leads to the uniformly minimum variance unbiased estimator (UMVUE) -- well, kind of, again assuming unbiased estimates and ignoring the fact that the sampling variances are actually often not exactly know, but are estimated themselves and in random-effects models, we must also estimate the variance component for heterogeneity, but then we just treated it as a known constant, which isn't quite right either ... but yes, we kind of get the UMVUE if we use inverse-variance weighting if we just squint our eyes very hard and ignore some of these issues.

So, it's efficiency of the estimator that is at stake here, not the unbiasedness itself. But even an unweighted average will often not be a whole lot less efficient than using an inverse-variance weighted average, especially in random-effects models and when the amount of heterogeneity is large (in which case the usual weighting scheme leads to almost uniform weights anyway!). But even in fixed-effects models or with little heterogeneity, the difference often isn't overwhelming.

And as you mention, one can also easily consider other weighting schemes, such as weighting by sample size or some function thereof, but again this is just an attempt to get something close to the inverse-variance weights (since the sampling variances are, to a large extent, determined by the sample size of a study).

But really, one can and should 'decouple' the issue of weights and variances altogether. They are really two separate pieces that one has to think about. But that's just not how things are typically presented in the literature.

However, the point here is that you really need to think about both. Yes, you can take an unweighted average as your combined estimate and that would, in essence, be a meta-analysis, but once you want to start doing inferences based on that combined estimate (e.g., conduct a hypothesis test, construct a confidence interval), you need to know the sampling variances (and the amount of heterogeneity). Think about it this way: If you combine a bunch of small (and/or very heterogeneous) studies, your point estimate is going to be a whole lot less precise than if you combine the same number of very large (and/or homogeneous) studies -- regardless of how you weighted your estimates when calculating the combined value.

Actually, there are even some ways around not knowing the sampling variances (and amount of heterogeneity) when we start doing inferential statistics. One can consider methods based on resampling (e.g., bootstrapping, permutation testing) or methods that yield consistent standard errors for the combined estimate even when we misspecify parts of the model -- but how well these approaches may work needs to be carefully evaluated on a case-by-case basis.

Meta-Analysis – Calculating Sampling Variance for One-Sample Data

This is an interesting question because (so far as I know) there is no widely used formula for computing the variance in this situation. Some time ago, I did some simulations to examine the performance of different formulas to estimate the sampling variance of Cohen's d in case of a one-sample t-test.

I was aware of three different formulas:

The formula used in the Comprehensive Meta-analysis Software:

(1/sqrt(ni))*sqrt(1+di^2/2)^2,

with ni being the sample size per study and di the observed Cohen's d.

Other people use the standard formula for the dependent samples t-test (e.g., Borenstein, 2009) with correlation between pre- and posttest (r) equal to 0.5:

(1/ni)+di^2/(2*ni)

Another formula I have seen is one that was used in a paper by Koenig et al. (2011). This formula is obtained by personal communication with B. Becker.

(1/ni)+di^2/(2*ni*(ni-1))

I did a very small simulation study to examine the performance of these three formulas with sample sizes ranging from 10 to 500 and effect sizes in the population ranging from 0 to 0.8. The differences between the formulas were most observable for a population effect size of 0.8.

Using the formula of the dependent samples t-test with r=0.5 yielded the least biased estimates. However, there may be other formulas with better properties. I am curious what other people think about this.

Code:

rm(list = ls()) # Clean workspace

k <- 10000 # Number of studies

thetais <- c(0, 0.2, 0.5, 0.8) # Effect in population 

nis <- c(10,15,20,30,50,75,100,250,500) # Sample size in primary study
sigma <- 1 # Standard deviation in population

### Empty objects for storing results
vi.ac <- vi.beck <- vi.comp <- vi.dep <- matrix(NA, nrow = length(nis), 
                                                ncol = length(thetais), 
                                                dimnames = list(nis, thetais))

############################################
for(thetai in thetais) {
  for(ni in nis) { 

    ### Actual variance Cohen's d
    sdi <- sqrt(sigma/(ni-1) * rchisq(k, df = ni-1))
    mi <- rnorm(k, mean = thetai, sd = sigma/sqrt(ni))
    di <- mi/sdi

    vi.ac[as.character(ni),as.character(thetai)] <- var(di)

    ############################################

    ### Suggestion by Becker in Koenig et al.
    vi <- (1/ni)+di^2/(2*ni*(ni-1))
    vi.beck[as.character(ni),as.character(thetai)] <- mean(vi)

    ############################################

    ### Comprehensive meta-analysis software
    vi <- (1/sqrt(ni))*sqrt(1+di^2/2)^2
    vi.comp[as.character(ni),as.character(thetai)] <- mean(vi)

    ############################################

    ### Dependent sample t-test with r=0.5
    vi <- (1/ni)+di^2/(2*ni)
    vi.dep[as.character(ni),as.character(thetai)] <- mean(vi)

  }
}

plot(x = nis, y = vi.ac[ ,1], type = "l", main = "theta = 0", ylab = "Variance")
lines(x = nis, y = vi.beck[ ,1], type = "l", col = "red")
lines(x = nis, y = vi.comp[ ,1], type = "l", col = "blue")
lines(x = nis, y = vi.dep[ ,1], type = "l", col = "green")
legend("topright", legend = c("Actual variance", "Becker", "CMA", "Dep. samples"), 
       col = c("black", "red", "blue", "green"), lty = c(1,1,1,1))

plot(x = nis, y = vi.ac[ ,2], type = "l", main = "theta = 0.2")
lines(x = nis, y = vi.beck[ ,2], type = "l", col = "red")
lines(x = nis, y = vi.comp[ ,2], type = "l", col = "blue")
lines(x = nis, y = vi.dep[ ,2], type = "l", col = "green")
legend("topright", legend = c("Actual variance", "Becker", "CMA", "Dep. samples"), 
       col = c("black", "red", "blue", "green"), lty = c(1,1,1,1))

plot(x = nis, y = vi.ac[ ,3], type = "l", main = "theta = 0.5")
lines(x = nis, y = vi.beck[ ,3], type = "l", col = "red")
lines(x = nis, y = vi.comp[ ,3], type = "l", col = "blue")
lines(x = nis, y = vi.dep[ ,3], type = "l", col = "green")
legend("topright", legend = c("Actual variance", "Becker", "CMA", "Dep. samples"), 
       col = c("black", "red", "blue", "green"), lty = c(1,1,1,1))

plot(x = nis, y = vi.ac[ ,4], type = "l", main = "theta = 0.8")
lines(x = nis, y = vi.beck[ ,4], type = "l", col = "red")
lines(x = nis, y = vi.comp[ ,4], type = "l", col = "blue")
lines(x = nis, y = vi.dep[ ,4], type = "l", col = "green")
legend("topright", legend = c("Actual variance", "Becker", "CMA", "Dep. samples"), 
       col = c("black", "red", "blue", "green"), lty = c(1,1,1,1))

data.frame(vi.ac[,1], vi.beck[,1], vi.comp[,1], vi.dep[,1])

References:

Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hedges & J. C. Valentine (Eds.), The Handbook of Research Synthesis and Meta-Analysis (pp. 221-236). New York: Russell Sage Foundation.

Koenig, A. M., Eagly, A. H., Mitchell, A. A., & Ristikari, T. (2011). Are leader stereotypes masculine? A meta-analysis of three research paradigms. Psychological Bulletin, 137, 4, 616-42.

Best Answer

Related Solutions

Meta-Analysis – Is Precision-Based (Inverse-Variance) Weighting Integral to Meta-Analysis?

Meta-Analysis – Calculating Sampling Variance for One-Sample Data

Related Question