Least Squares – How to Decompose Total Slope into Between-Group and Within-Group Contributions?

group-differencesleast squaresmultilevel-analysissimpsons-paradox

Consider the following data (I put the data as a table in the bottom of this question.)

All Data

These data are in two groups, blue and orange. In each group there is a positive relationship, while pooling across groups, there is a negative relationship. So there is Simpson's paradox. (In my application, the two groups blue and orange are school districts and the dots are schools, but that is not important for the question.)

If I run an OLS regression using all the data I get these estimates,

ppSpend = 9.481481 + -2.962963*pctPoor

Now if I average the data up to the group-level (points weighted equally) I get this scatterplot

Group means

If I run an OLS regression using the group averages (so this is the "between model") I get

ppSpend_GroupMean = 10.14286 + -4.285714*pctPoor_GroupMean

Finally, here is the data if I demean both the dependent and independent variables by group (plotting the points at different sizes so that they can be seen):

Demeaned

If I run an OLS regression on this "demeaned" model ("within model") I get

ppSpend_Demeaned = 0 + 10*pctPoor_Demeaned

Here's my question: is there an interpretable weight A such that

Total Slope = A*(Between Slope) + (1-A)*(Within Slope) ?

In my example,

-2.962963 = A*(-4.285714) + (1-A)*(10)

Of course the specific number in my example is 0.907407, but I would like to know if there's some general expression for that number from interpretable things calculated from the data.

Data used in the example (as csv):

group,pctPoor,ppSpend
1,0,8
1,.1,9
1,.2,10
1,.3,11
2,.7,5
2,.8,6
2,.9,7
2,1,8

Best Answer

Let g be the indicator of the first group. That is, it is a vector of length 8 whose first 4 elements are 1 and whose last 4 are 0.

Let P be the projection onto the space spanned by g and 1-g -- if there were k groups then we would consider the space spanned by k vectors but here we have only two -- and let Q=I-P be the orthogonal complement projection. Also let y be ppSpend and x be pctPoor.

Let b, w and t be the between, within and total slopes. That is they are the slopes of the regression (including intercept) of y on Px, y on Qx and y on x respectively. Then we interpret the question as asking what the relationship is among b, w and t and it is:

var(Px) * b + var(Qx) * w = var(x) * t

which follows from the fact that the slopes are given by the three expressions below and that the numerators of b and w sum to the numerator of t (and similarly for the denominators).

b = cov(Px, y) / var(Px)
w = cov(Qx, y) / var(Qx)
t = cov(x, y) / var(x)

Dividing through the equation involving b, w and t by var(x) and letting a = var(Px)/var(x) we can write it as this convex combination.

a * b + (1-a) * w = t

The formula var(Px) / var(x) can be regarded as the squared cosine of the angle between Px and x if we regard squared length to be var.

We can illustrate this using R.

g <- rep(1:0, each = 4)
x <- c(0, 0.1, 0.2, 0.3, 0.7, 0.8, 0.9, 1)
y <- c(8, 9, 10, 11, 5, 6, 7, 8)

n <- length(y)
G <- cbind(g, 1-g)
P <- G %*% solve(crossprod(G), t(G))
Q <- diag(n) - P

b = cov(P %*% x, y) / var(P %*% x); b  # or coef(lm(y ~ P %*% x))[[2]]
##           [,1]
## [1,] -4.285714

w = cov(Q %*% x, y) / var(Q %*% x); w  # or coef(lm(y ~ Q %*% x))[[2]]
##      [,1]
## [1,]   10

t = cov(x, y) / var(x); t  # or coef(lm(y ~ x))[[2]]
## [1] -2.962963

a <- var(P %*% x) / var(x); a
##           [,1]
## [1,] 0.9074074

# P %*% x also equals ave(x, g) in R so we can alternately write a as:
var(ave(x, g)) / var(x)
## [1] 0.9074074

# Using a, b and w from above, we see this equals the t shown above
a * b + (1-a) * w
##           [,1]
## [1,] -2.962963
Related Question