Solved – Interpreting meta-regression outputs from metafor package

meta-analysismeta-regressionr

I have been using the metafor package for some meta-analyses and would like to adjust for a single continuous covariate (mean age) using meta-regression. However, I require some clarification regarding the outputs and what they mean. Below I have shared the output for the base case analysis as well as the meta-regression (same studies in both, with the only difference being the addition of covariates for the meta-regression).

Base case output

Random-Effects Model (k = 36; tau^2 estimator: DL)

  logLik  deviance       AIC       BIC      AICc  
-18.8613   60.5927   41.7226   44.8896   42.0862  

tau^2 (estimated amount of total heterogeneity): 0.0633 (SE = 0.0327)
tau (square root of estimated tau^2 value):      0.2515
I^2 (total heterogeneity / total variability):   51.46%
H^2 (total variability / sampling variability):  2.06

Test for Heterogeneity: 
Q(df = 35) = 72.1031, p-val = 0.0002

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub          
  0.1266   0.0633   2.0014   0.0453   0.0026   0.2506        * 

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Meta-regression (output)

Mixed-Effects Model (k = 36; tau^2 estimator: DL)

  logLik  deviance       AIC       BIC      AICc  
-18.7696   60.4092   43.5391   48.2897   44.2891  

tau^2 (estimated amount of residual heterogeneity):     0.0677 (SE = 0.0346)
tau (square root of estimated tau^2 value):             0.2601
I^2 (residual heterogeneity / unaccounted variability): 52.84%
H^2 (unaccounted variability / sampling variability):   2.12
R^2 (amount of heterogeneity accounted for):            0.00%

Test for Residual Heterogeneity: 
QE(df = 34) = 72.1024, p-val = 0.0001

Test of Moderators (coefficient(s) 2): 
QM(df = 1) = 0.2456, p-val = 0.6202

Model Results:

         estimate      se     zval    pval    ci.lb   ci.ub   
intrcpt   -0.3741  1.0140  -0.3690  0.7122  -2.3616  1.6133   
mods       0.0085  0.0172   0.4955  0.6202  -0.0252  0.0423   

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

My questions are:

Why are we observing an R-squared of 0% in the meta-regression (is it simply because the covariate is not significant or do you suspect something is not correct)?
How can we interpret the outputs of the meta-regression? With back-transformation of the logHRs we suspect something like below, but would like to make sure that I am interpreting the ‘intrcpt’ and ‘mods’ values correctly.
- I have assumed mods represents the pooled HR taking into account the adjustment for age.
- I have assumed intrcpt represents the covariate effect (beta) – i.e. the amount that the logHR changes for a one unit increase in age. Also, I have back-transformed this output, which I am not sure is appropriate, or if I should present as is.

Best Answer

(1) Why are we observing an R-squared of 0% in the meta-regression (is it simply because the covariate is not significant or do you suspect something is not correct)?

It is likely that you're observing $R^2 = 0$ because the model fits the data badly (i.e., there's no evidence that mean age has any relationship to the outcome). It is always a good idea to plot your model along with your data when you can. A plot can help you do a visual sanity check that the data tends to follow your regression line. Since you have just a single moderator, it would be fairly easy to make a scatter plot. Here is a metafor example with code you can use. Looking at your base case, the estimated amount of total heterogeneity ($\tau$) is pretty much equal to the estimated amount of residual heterogeneity in the meta-regression, so the addition of the mean age covariate hasn't explained any of the variability between studies.

(2.1) I have assumed mods represents the pooled HR taking into account the adjustment for age.

Mods (0.0085) represents the estimated change in the true logHR for each one unit increase in mean age. However, the p-value is high (and the $95\%$ confidence interval is [-0.0252, 0.0423]) indicating that you don't have significant evidence that this effect is any different from 0.

(2.2) I have assumed intrcpt represents the covariate effect (beta) – i.e. the amount that the logHR changes for a one unit increase in age. Also, I have back-transformed this output, which I am not sure is appropriate, or if I should present as is.

Intrcpt represents the pooled logHR when the mean age is 0. It does not really make sense to interpret the intercept on its own since a mean age of 0 is not physically plausible. Back-transforming the output is fine (using the exponential function to place the estimate on the HR scale).

As a side note, be cautious about how you approach a meta-regression. If you do try out many different covariates and then stop when you find a significant one, then you should report that this is what you did, because it reduces the significance of your evidence (after all, 5 out of every 100 covariates will appear significant (p < 0.05) just by chance alone, so trying too many covariates increases the probability that you detect an effect when none is truly present).

Related Solutions

Solved – Metafor package in R: meta regression and scatter plot

Assuming you are trying to model the relationship between year and the log odds of the outcome of interest using a logistic mixed-effects model, then yes, you used the right model.

You may want to rescale wi a bit. Such as:

wi <- 0.5 + 3.0 * (wi - min(wi))/(max(wi) - min(wi))

Something like this should do:

years <- 1998:2014
preds <- predict(model_A, transf = transf.ilogit, newmods = years)
plot(year, transf.ilogit(dat$yi), cex=wi)
lines(years, preds$pred)
lines(years, preds$ci.lb, lty="dashed")
lines(years, preds$ci.ub, lty="dashed")

Solved – Metafor package: Interpreting meta-regression model

Yes, based on what you have shown, I would say that the analysis is sensible. One concern might be the relatively large number of moderator variables (or more specifically, model coefficients) relative to the number of estimates. Right now, you have $105 / 14 = 7.5$ estimates per coefficient (not counting the intercept). Some might want that ratio to be closer to 10 or even 15, but some might also be okay with a ratio of 5. None of these are right or wrong, but the lower the ratio, the more concerned I would be with overfitting.
Indeed, strictly speaking, the PSS version factor fails to be significant at $\alpha = .05$. However, I think you can still discuss this factor -- cautiously. Based on psychometric theory and all else equal, it is to be expected that longer versions would lead to higher reliability, which is indeed what you find here (although the 14-item version does not seem to yield, on average, higher reliability than the 10-item version -- maybe those 4 extra items are not as internally consistent as the rest or maybe there is something else that is different about studies examining the 14-item version that is not captured by all the other moderator variables already included in the model).
It is common practice to examine one moderator at a time. In principle, this is poor practice, since moderator variables are often correlated. So, fitting a model including multiple moderators (as you have done) would be better, as that gets you closer to examining the contribution of a particular moderator variable while controlling for the rest. One reason why this is often not done is that the dataset typically looks like Swiss cheese, with lots of holes (i.e., missing data) in it. After listwise deletion, one then ends up with a (much) smaller dataset (i.e., only the studies with complete information on all moderator variables). Besides the loss of information itself, when this happens, a major concern here is potential bias due to the missingness. Hence, instead, analyses are often conducted one moderator at a time, so that all of the studies providing information on a particular moderator variable can be used. Bias due to missingness may still be an issue here, but maybe less so. But, as mentioned at the beginning, you are then not controlling for other moderator variables, so a "fake" moderator might appear to be relevant simply because it is correlated with a "true" moderator.

There are fancy techniques to deal with missingness (e.g., multiple imputation, full information maximum likelihood estimation), but these methods are poorly developed in the meta-analytic context. Alternatively, you could run the 'full model' analysis and the 'one at a time' analyses and put them side-by-side and hopefully you find some consistency in the conclusions. If so, the discussion section will be easy to write. If not, then good luck ;)

Best Answer

Related Solutions

Solved – Metafor package in R: meta regression and scatter plot

Solved – Metafor package: Interpreting meta-regression model

Related Question