Solved – Is it a mistaken idea to use standardized coefficients to assess the relative importance of regression predictors

regressionsemi-partialstandardization

There are various questions that speak to the relative merits of various methods of assessing the importance of regression predictors, for example this one.

I noticed that in this comment @gung refers to the practice as a "mistaken idea", linking to this answer in support of this claim. The final paragraph of the answer is the relevant part.

I feel this issue deserves its own question, and also am a little unsure about some aspects of the reasoning. The most important segment of the paragraph in question goes

unless the true r is exactly 0, the estimated r is driven in large
part by the range of covariate values that are used.

Is this equivalent to saying that we shouldn't use standardized coefficients to assess importance because we might have randomly sampled a restricted range of $X1$ values and a wider range of $X2$ values? Then when we standardize this problem hasn't gone away, and we end up spuriously thinking that $X1$ is a weaker predictor than $X2$?

Why does the problem go away if the true $r$ is exactly 0?

How do other methods (e.g. looking at semipartial coefficients) do away with this problem?

Best Answer

gungs answer is in my view a critique of the idea to compare the relative strength of different variables in an empirical analyses without having a model in mind how those variables interact or how the (true) joint distribution of all relevant variables looks like. Think of the example of the importance of athlete's height and weight gungs mentions. Nobody can proof that for example an additive linear regression is a good approximation of the conditional expectation function or in other words, height and weight might be important in a very complicated manner for athlete's performance. You can run a linear regression including both variables and compare the standardized coefficients but you do not know whether the results really make sense.

To give a Mickey Mouse example, looking at sports climber (my favorite sports), here is a list of top male climbers according to some performance measure taking from the site 8a.nu with information about their height, weight and year born (only those with available information). We standardize all variables beforehand so we can compare directly the association between one standard deviation changes in the predictors on one standard deviation change in the performance distribution. Excluding for the illustration the number one, Adam Ondra, who is unusual tall, we get the following result. :

    rm(list=ls(all=TRUE))
    # Show only two decimal places
    options(digits=2)
    # Read Data and attach
    climber<-read.table("https://drive.google.com/uc?export=&confirm=no_antivirus&id=0B70aDwYo0zuGNGJCRHNrY0ptSW8",sep="\t",header=T)
    head(climber)
    # Drop best climber Adam Ondra who is very tall (kind of outlier)
    climber<-subset(climber,name!="Adam Ondra")
    # Standardize Predictors
    climber$performance_std<-(climber$performance-mean(climber$performance))/sd(climber$performance)
    climber$height_std<-(climber$height-mean(climber$height))/sd(climber$height)
    climber$weight_std<-(climber$weight-mean(climber$weight))/sd(climber$weight)
    climber$born_std<-(climber$born-mean(climber$born))/sd(climber$born)
    # Simple Regression, excluding intercept because of the standardization
    lm(performance_std~height_std+weight_std-1,data=climber)$coef
height_std weight_std 
 -0.16      -0.25 

Ignoring standard errors etc. at all, it seems that weight is more important than height or equally important. But one could argue that climbers have become better over time. Perhaps we should control for cohort effects, e.g. training opportunities through better indoor facilities? Let us include year of birth!

    # Add year of birth
    lm(performance_std~height_std+weight_std+born_std-1,data=climber)$coef
height_std weight_std   born_std 
-0.293     -0.076      0.256

Now, we find that being young and being small is more important than being slim. But now another person could argue this holds only for top climbers? It could make sense to compare the standardized coefficients across the whole performance distribution (for example via quantile regression). And of course it might differ for female climbers who are much smaller and slimmer. Nobody knows.

This is a Mickey Mouse example of what I think gung refers to. I am not so skeptical, I think it can make sense to look at standardized coefficients, if you think that you have specified the right model or that additive separability make sense. But this depends as so often on the question at hand.

Regarding the other questions:

Is this equivalent to saying that we shouldn't use standardized coefficients to assess importance because we might have randomly sampled a restricted range of X1 values and a wider range of X2 values? Then when we standardize this problem hasn't gone away, and we end up spuriously thinking that X1 is a weaker predictor than X2?

Yes, I think you could say that like this. The "wider range of X2 values" could arise through omitted variable bias by including important variables correlated with X1 but omitting those which are correlated with X2.

Why does the problem go away if the true r is exactly 0?

Omitted variable bias is again a good example why this holds. Omitted variables cause only problems (or bias) if they are correlated with the predictors as well as with the outcome, see the formula in the Wikipedia entry. If the true $r$ is exactly 0 than the variable is uncorrelated with the outcome and there is no problem (even if it is correlated with the predictors).

How do other methods (e.g. looking at semipartial coefficients) do away with this problem?

Other models have such as semipartial coefficients face the same problem. If your dataset is large enough, you can do for example nonparametric regression and try to estimate the full joint distribution without assumptions about the functional form (e.g. additive separability) to justify what you are doing but this is never a proof.

Summing up, I think it can make sense to compare standardized or semipartial coefficients but it depends and you have to reason yourself or others why you think it make sense.