Solved – is R output for lm showing beta or b weight

rregression

I searched the questions and have not found a clear answer. I am doing regression in R using the lm function. When the summary is returned, it has regression coefficients, but I can't figure out if these are beta or B weights? In other words, are they standardized or unstandardized coefficients? I'm new to R and usually use SPSS, which label them as standardized/unstandardized.

Best Answer

R's lm() function outputs un-standardized coefficients. You can try lm.beta package if you want to extract standardized ones. In its original documentation (in this link), you can find a simple example about how to use it.

model <- lm(Y ~ X)           #fit linear model.
model.beta <- lm.beta(model) #standardize coefficients

Related Solutions

Solved – How to get the standardized beta coefficients from glm.nb regression in R

A quick way to get at the standardized beta coefficients directly from any lm (or glm) model in R, try using lm.beta(model). In the example provided, this would be:

library("MASS")
nb = glm.nb(responseCountVar ~ predictor1 + predictor2 + 
    predictor3 + predictor4 + predictor5 + predictor6 + 
    predictor7 + predictor8 + predictor9 + predictor10 + 
    predictor11 + predictor12 + predictor13 + predictor14 + 
    predictor15 + predictor16 + predictor17 + predictor18 + 
    predictor19 + predictor20 + predictor21,
    data=myData, control=glm.control(maxit=125))
summary(nb)

library(QuantPsyc)
lm.beta(nb)

Multiple Regression – How to Interpret Standardized Regression Coefficients and P-Values

For the standard linear regression model the absolute value of the coefficient estimates and the p-value are not related in the way you describe. It is very possible to have absolutely large coefficients which are insignificant and absolutely small coefficients which are very significant. What your missing in your interpretation is the effect of the coefficient estimate standard errors.

The coefficients R reports (lets call them $b_1,b_2,b_3,...,b_k$) are the best linear unbiased estimators of the true parameters $\beta_1,\beta_2,\beta_3,...,\beta_k$ in that they minimize the sum of squared error or formally: $$ \{b_1,b_2,...,b_k\} = {\textrm{argmin} \atop \alpha}\left\{ \sum_{i=1}^{n}(y_i-\alpha_1x_{i,1}-. . .-\alpha_kx_{i,k})^2]\right\} $$

The p-value for the $i^{th}$ coefficient which R is reporting is the result of the following hypothesis test:

$H_0: \beta_i = 0$

$H_A: \beta_i \neq 0$

Assuming the regression is properly specified, it can be shown, with the central limit theorem, that each $b_i$ is a normally distributed random variable with mean $\beta_i$ and some standard deviation (also called standard error) $\sigma_i$. This is because the $b$'s are estimated with a random sample so they too are random variables (roughly speaking). What determines the $i^{th}$ p-value is where 0 "lands" in the normal distribution $N(\beta_i,\sigma_i^2)$ (technically the test is done using a t-distribution...but the difference is not so important for addressing your question). If zero is in the tails of $N(\beta_i,\sigma_i^2)$ the p-value is low, if it's more in the middle the p-value is high.

So given two estimates $b_i$ and $b_j$ where $b_i$ is "super far away" from zero and $b_j$ is "super close to" zero, the p-value of $b_i$ would be lower than $b_j$ assuming $\sigma_i$=$\sigma_j$. The part you are missing in your interpretation is that $\sigma_i$ and $\sigma_j$ can be very different. Essentially if $b_i$ is "huge" but $\sigma_i$ is also "huge" you see that you can get a high p-value. Conversely for "small" $b_i$ and "super small" $\sigma_i$, you see you can get a small p-value.

I hope that helps!

Best Answer

Related Solutions

Solved – How to get the standardized beta coefficients from glm.nb regression in R

Multiple Regression – How to Interpret Standardized Regression Coefficients and P-Values

Related Question