Why are 1/SE or 1/variance commonly used as weights in regressions

meta-analysisregressionweighted-regressionweights

I'm trying to do a meta-analysis for the first time, comparing measurements of a simple experiment treatment against a control in a variety of species. I started by fitting a mixed-effects model to mean values collected from a set of published studies. This works fine, but it of course ignores the fact that the means are estimated with different levels of precision. It sounds like this is a problem that should be solved by weighting the means appropriately. Several of these studies published standard errors or their raw data, so I can use SEs for weighting.

My question is: What is special about 1/SE (or 1/ σ2)? Why not use 2/SE, or 10/SE, or any other such weighting factor?

I was stimulated to ask this because the SEs differ by almost two orders of magnitude due to a few of the studies using extremely precise instruments. Greater precision is obviously better, but it's not obvious to me that these precise estimates are worth as much as 100 less precisely estimates. (Weighting also introduced some algorithmic problems when fitting the model, but that is probably solvable and I'm more interested in the conceptual justification).

Best Answer

You are asking about what is called "weighted least squares". The idea is that if for some $X_i$ values (values of the independent variable) your observations show more variation in the dependent variable $Y_i$ than is the case for other $X_i$ values, those with more variation would also have larger residuals around the regression line and hence influence the sum of squared residuals more than the observations with smaller variation around the line do. You clearly would in general not want this to be the case.

If you know the std. deviation $\sigma_i$ of your $Y$ data for each particular $X_i$ value, you can transform your $Y_i$ and $X_i$ data by dividing them through the known std. deviation $\sigma_i$. If you then regress $\frac{Y_i}{\sigma_i}$ on $\frac{X_i}{\sigma_i}$ the residuals in this new regression do have the same variance for each $X_i$ value.

Mostly, the true $\sigma_i$ are unknown, and instead the sample estimates $s_i$ are used. In the situation you describe you know the std. errors of the means in all the studies, and thus you can divide through these std. errors.

EDIT

There is a tricky detail I forgot to mention! Also, the intercept has to be divided by $s_i$ and finally, the residual $r_i$ term also. I will demonstrate this in the R script below too. So, the regression equation changes from

$y_i = b_0 + b_1x_i + r_i$

into

$\frac{y_i}{s_i} = b_0\frac{1}{s_i} + b_1\frac{x_i} {s_i} + \frac{r_i}{s_i}$

In the "transformed" regression equation, the sum of squares of the (transformed) residuals which is minimized is $\sum\frac{r_i^2}{\ s_i^2}$. Due to the squares, it is said that the "regression weights" are equal to $\frac{1}{s_i^2}$.

If you would use for the weights $\frac{9}{s_i^2}$ instead of simply $\frac{1}{s_i^2}$, then the sum of squares of the residuals would be nine times as high, and hence the residual standard error would be $\sqrt9=3$ times as high, as you can verify in the script below, comparing the results of model2 and model3. But for the regression coefficients and their std. error and p-values nothing changes!

# Generate some data: 100 cases, 10 for each x value 1 through 10 .
set.seed(123)
x <- rep(1:10, each=10)
# Generate random error, dependent on the x value.
error <- rnorm(100,0,1) * x
y <- 1+0.5*x + error

# Notice the heteroscedasticity over the x values.
plot(x,y)

# Calculate sd per x value.
sd_i <- tapply(y, x, sd)

# Repeat each sd value 10 times, as each x value appears 10 times.
sd_i <- rep(sd_i, each=10)

# Calculate the weights variable for the linear model to be aplied later.
theweights <- 1/sd_i^2

# Now instead of running a linear model on y and x using the regression weights, 
# we now (for illustrative purposes) transform the data as described and 
# run a linear model on these transformed data, NOT using the weights. 
y_w <- y / sd_i
x_w <- x / sd_i
newintercept <- 1 /sd_i 

# Run a linear model on these transformed data.
model1 <- lm(y_w ~ 0 + newintercept + x_w)
summary(model1)

# Now, run a linear model using regression weights: results are equal to those
# of model1.
model2 <- lm(y ~ x, weights=theweights)
summary(model2)

# Use another arbitrary numerator in the weights.
theweights <- 9/sd_i^2
model3 <- lm(y ~ x, weights=theweights)
summary(model3)

The results of this script are:

Scatterplot of y against x: larger variances for larger x values

> model1 <- lm(y_w ~ 0 + newintercept + x_w)
> summary(model1)

Call:
lm(formula = y_w ~ 0 + newintercept + x_w)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5047 -0.7491 -0.0990  0.7445  1.9961 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
newintercept   0.8120     0.3541   2.293    0.024 *  
x_w            0.7078     0.1228   5.765 9.48e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.006 on 98 degrees of freedom
Multiple R-squared:  0.5763,    Adjusted R-squared:  0.5677 
F-statistic: 66.65 on 2 and 98 DF,  p-value: < 2.2e-16

> 
> # Now, run a linear model using regression weights: results are equal to those
> # of model1.
> model2 <- lm(y ~ x, weights=theweights)
> summary(model2)

Call:
lm(formula = y ~ x, weights = theweights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-2.5047 -0.7491 -0.0990  0.7445  1.9961 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.8120     0.3541   2.293    0.024 *  
x             0.7078     0.1228   5.765 9.48e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.006 on 98 degrees of freedom
Multiple R-squared:  0.2533,    Adjusted R-squared:  0.2456 
F-statistic: 33.24 on 1 and 98 DF,  p-value: 9.475e-08

> 
> # Use another arbitrary numerator in the weights.
> theweights <- 9/sd_i^2
> model3 <- lm(y ~ x, weights=theweights)
> summary(model3)

Call:
lm(formula = y ~ x, weights = theweights)

Weighted Residuals:
   Min     1Q Median     3Q    Max 
-7.514 -2.247 -0.297  2.234  5.988 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.8120     0.3541   2.293    0.024 *  
x             0.7078     0.1228   5.765 9.48e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.019 on 98 degrees of freedom
Multiple R-squared:  0.2533,    Adjusted R-squared:  0.2456 
F-statistic: 33.24 on 1 and 98 DF,  p-value: 9.475e-08
Related Question