Solved – How to get 95% CIs for standardized regression coefficients

categorical dataconfidence intervalrregressionregression coefficients

I am running multiple linear regression with categorical variables and I need confidence interval 95% for standardized regression coefficient. I searched around and found 2 methods:

Using the QuantPsyc package, with the function lm.beta. However, using lm.beta I can only get the standardized coefficients whereas I need with their 95% CI too. Is there a way?
To extract standardized regression coefficient, first standardize all the variables involved, and then run it in linear regression then you'll get estimates for standardized coefficients.

So here is my model:

model1 <- lm(Life_Satisfaction ~ Subjective + Age + Sex + CountryCat11 + 
                                 CountryCat12 + CountryCat13 + CountryCat14 + 
                                 CountryCat15 + CountryCat16 + CountryCat17 + 
                                 CountryCat18 + CountryCat19 + CountryCat20 + 
                                 CountryCat23 + CountryCat25 + CountryCat28 + 
                                 CountryCat29 + CountryCat30 + Education_ISCED1 + 
                                 Education_ISCED2 + Education_ISCED3 + 
                                 Education_ISCED4 + Education_ISCED5 + 
                                 Education_ISCED6 + Education_stillinschool + 
                                 Education_None + Education_other, data=lifesat)

lm.beta (model1)

I ran that, but I cannot get the 95% CI.

So I tried the scale method:

model2 <- lm(scale(Life_Satisfaction) ~ scale(Subjective) + scale(Age) + 
                                        scale(Sex) + scale(CountryCat11) + 
                                        scale(CountryCat12) + scale(CountryCat13) + 
                                        scale(CountryCat14) + scale(CountryCat15) + 
                                        scale(CountryCat16) + scale(CountryCat17) + 
                                        scale(CountryCat18) + scale(CountryCat19) + 
                                        scale(CountryCat20) + scale(CountryCat23) + 
                                        scale(CountryCat25) + scale(CountryCat28) + 
                                        scale(CountryCat29) + scale(CountryCat30) + 
                                    scale(Education_ISCED1) + scale(Education_ISCED2) + 
                                    scale(Education_ISCED3) + scale(Education_ISCED4) + 
                                    scale(Education_ISCED5) + scale(Education_ISCED6) + 
                               scale(Education_stillinschool) + scale(Education_None) + 
                                        scale(Education_other), data=lifesat)

summary(model2)

I ran that, and I got the standardized regression and 95% CI but it was different from the standardized regression results I got from SPSS? Did I do it wrong?

Best Answer

For simplicity, assume that there is one focal continuous predictor $x$ and a continous outcome $y$. Standardization doesn't really make a lot of sense with categorical predictors, imo. The regression model could include more predictors but the following answer focuses only on one of them. Then, we have four possibilities:

Both $y$ and $x$ are standardized (meaning both have mean $0$ and standard deviation $1$). Denote the regression coefficient of $x$ as $\beta_{xy}$.
Only $x$ is standardized. Denote the regression coefficient of $x$ as $\beta_{x}$.
Only $y$ is standardized. Denote the regression coefficient of $x$ as $\beta_{y}$.
Neither $y$ or $x$ are standardized. Denote the regression coefficient of $x$ as $\beta$.

Further, let $s_x$ and $s_y$ be the standard deviations of $x$ and $y$, respectively.

In the following section, I'm going to show how to convert the regression coefficients from the standardized models (cases 1-3) to the coefficient in the unstandarized model (case 4) and vice versa. The crucial thing to note is that the same conversion formulas can be applied for converting standard errors and/or confidence limits! An illustration of case 1 in R is at the bottom of this answer.

Case 1: Both $y$ and $x$ are standardized

To convert from $\beta$ to $\beta_{xy}$ without running another model: $\beta_{xy} = \beta\cdot \frac{s_x}{s_y}$.
To convert from $\beta_{xy}$ to $\beta$ without running another model: $\beta = \beta_{xy}\cdot \frac{s_y}{s_x}$.

To answer your first question: Calculate the regression model with no standardized variables. Multiply the confidence limits for the regression coefficients with $\frac{s_x}{s_y}$.

Case 2: Only $x$ is standardized

To convert from $\beta$ to $\beta_{x}$ without running another model: $\beta_{x} = \beta\cdot s_x$.
To convert from $\beta_{x}$ to $\beta$ without running another model: $\beta = \beta_{x}\cdot \frac{1}{s_x}$.

Case 3: Only $y$ is standardized

To convert from $\beta$ to $\beta_{y}$ without running another model: $\beta_{y} = \beta\cdot \frac{1}{s_y}$.
To convert from $\beta_{y}$ to $\beta$ without running another model: $\beta = \beta_{y}\cdot s_y$.

Here is a short illustration in R for the first case. The focal predictor is Fertility:

# Standard deviations
sx <- sd(swiss$Fertility)
sy <- sd(swiss$Infant.Mortality)

# Models
mod_unstand <- lm(Infant.Mortality~Fertility + Agriculture, data = swiss)
mod_fully_stand <- lm(scale(Infant.Mortality)~scale(Fertility) + scale(Agriculture), data = swiss)

coef(mod_unstand)[2]

Fertility 
0.1166856

# Convert unstandardized coefficient of "Fertility" to a fully standardized one
0.11668557*(sx/sy)

[1] 0.50043

# Check
coef(mod_fully_stand)[2]

scale(Fertility) 
         0.50043

For the confidence intervals, we use the same conversions:

# Confidence interval for the unstandardized coefficient
confint(mod_unstand)[2, ]

     2.5 %     97.5 % 
0.04993591 0.18343524

# Convert the confidence limits from the unstandardized model to a full standardized model
confint(mod_unstand)[2, ]*(sx/sy)

    2.5 %    97.5 % 
0.2141604 0.7866996

# Check
confint(mod_stand)[2, ]

    2.5 %    97.5 % 
0.2141604 0.7866996

Related Solutions

Solved – How to interpret coefficients of categorical predictors in the negative binomial regression model

I'm going to answer this using a Poisson model, which is precisely a negative binomial model without overdispersion, because the math will be simpler. The poisson model predicts the probability of observing $y_i$ to be a particular non-negative discrete number $$P(y_i|X) = \dfrac{\exp(-\lambda_i)\lambda_i ^{y_i}}{y_i!}$$

The conditional mean of this distribution $\lambda_i$. $$E[y_i|x_i] = \lambda_i = \exp(x_i\beta)$$ $$\log \lambda_i = x_i\beta$$ The conditional variance of the poisson model is also $\lambda_i$, but the variance of the negative binomial model is $\lambda_i + \alpha \lambda_i$. This is the only practical difference between the two models for the purposes of this answer.

This is effectively a log-linear model. So the marginal effect of $x$ on $\lambda$ can be shown as

$$\dfrac{\partial E[y|x]}{\partial x} = \dfrac{\partial\lambda_i}{\partial x} = \exp(\beta)$$

So if you have a negative $\beta$ for a dummy variable $x$, you can say that "on average, $x$ lowers the expected value of $\log(y)$ by $\beta$*100 percent."

Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

It's never easy telling your professor that they are wrong.

Standardized coefficients can be greater than 1.00, as that article explains and as is easy to demonstrate. Whether they should be excluded depends on why they happened - but probably not.

They are a sign that you have some pretty serious collinearity. One case where they often occur is when you have non-linear effects, such as when $x$ and $x^2$ are included as predictors in a model.

Here's a quick demonstration:

data(cars)
cars$speed2 <- cars$speed^2
cars$speed3 <- cars$speed^3
fit1 <- lm(dist ~ speed, data=cars)
fit2 <- lm(dist ~ speed + cars$speed2, data=cars)
    fit3 <- lm(dist ~ speed + cars$speed2 + speed3, data=cars)
summary(fit1)
summary(fit2)
summary(fit3)
lm.beta(fit1)
lm.beta(fit2)
lm.beta(fit3)

Final bit of output:

> lm.beta(fit3)
   speed    speed2    speed3 
  1.395526 -2.212406  1.681041

Or if you prefer you can standardize the variables first:

 zcars <- as.data.frame(rapply(cars, scale, how="list"))
 fit3 <- lm(dist ~ speed + speed2 + speed3, data=zcars)

 summary(fit3)

Call:
lm(formula = dist ~ speed + speed2 + speed3, data = zcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.03496 -0.37258 -0.08659  0.27456  1.73426 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.440e-16  8.344e-02   0.000    1.000
speed        1.396e+00  1.396e+00   1.000    0.323
speed2      -2.212e+00  3.163e+00  -0.699    0.488
speed3       1.681e+00  1.853e+00   0.907    0.369

Residual standard error: 0.59 on 46 degrees of freedom
Multiple R-squared:  0.6732,    Adjusted R-squared:  0.6519 
F-statistic: 31.58 on 3 and 46 DF,  p-value: 3.074e-11

You don't need to do it with lm(), you can do it with matrix algebra if you prefer:

Rxx <- cor(cars)[c(1, 3, 4), c(1, 3, 4)]
Rxy <- cor(cars)[2, c(1, 3, 4)]
B <- (ginv(Rxx)) %*% Rxy
B

         [,1]
[1,]  1.395526
[2,] -2.212406
[3,]  1.681041

Best Answer

Related Solutions

Solved – How to interpret coefficients of categorical predictors in the negative binomial regression model

Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

Related Question