Solved – Comparing regression coefficients across models with standardized dependent variables

model comparisonregressionstandardization

The Situation: I have four similar spatial regression models, differing only in their dependent variables. The independent variables consist of a standard set of variables derived from a principal component analysis. The dependent variables have been standardized by using the standard score for each observation.

The Question: Can the regression coefficients be directly compared across the models? For example, can I say that because the coefficient for an IV is .25 in a model using the first DV and .50 in a model using the second DV, that the impact of the IV in the first model is twice that of the impact of the second model?

A little more background:

I am using four spatial spatial regression models to to look at what socio-demographic factors are associated with poverty in an area. The units of observation are census tracts.

The four models are identical, except for the dependent variable used. Each model has the same set of independent variables from the same location and data set. The independent variables were derived using factors from a principal component analysis.

The dependent variables used represent four different ways of measuring the well-being of a census tract, which are: 1) the percentage of poor people per census tract using poverty line A, 2) the percentage of poor people per census tract using poverty line B, 3) The Per Capita Income of the census tract, and 4) A ratio of census tract income to average regional income.

Instead of using the observations for the DVs however, I have used the standard scores (z scores) of each observation, with the intention of making the regression results comparable across models.

To further complicate things: Two of the variables directly measure poverty, while the other two measure income, so the direction of the relationships between the IVs and DVs has been reversed. Would this cause any additional problems in directly comparing regression coefficients?

Best Answer

No, you cannot state that an independent variable has twice as large an impact on one DV (dependent variable) as another DV merely by comparing coefficients in the models. Why? Because your dependent variables are not measuring comparable quantities in all four cases above.

Let's take a different example to highlight the strangeness: in one model, rainfall predicts annual crop yield in tonnes of grain/acre/year (coef = 0.5) and in a separate model, it also predicts population density in people/acre (coef = 20). Does this mean that rainfall has a stronger influence on population density than on crop yield? Well, suppose you instead measured annual crop yield in kilograms of grain/acre/year, your rainfall coefficient would be 500 (0.5 * 1000, because 1 tonne = 1000 kg). This change in unit would reverse the hierarchy and your conclusions, which obviously does not make sense. So the basic problem is that annual crop yield and population density are not in comparable units.

This could be addressed by standardizing the dependent variables, in which case the coefficient interpretation would be that a unit change in rainfall leads to a change of $x$ standard deviations in either crop yield or population density. A larger coefficient in one model can then be interpreted as evidence for rainfall having a stronger effect on one quantity, given the variation in your data.

Now, you do actually have two DVs that are in comparable units: % poor people by poverty lines A and B. So in principle, you can make the comparison you've asked about for these two cases (but not the others). But you should probably be careful when interpreting this, since both measure precisely the same quantity but with different cutoffs. Differences in the effect of your independent variable are telling you something about the cutoff, which should perhaps be evident before you fitted your model.

Related Question