Solved – Beta coefficient interpretion with categorical and continuous predictors in a linear regression

categorical dataregression coefficients

I am trying to run a linear regression with both categorical and continuous predictors. I have coded the categorical predictor (with three levels) into three dummy variables, and entered the two dummy variables into the regression along with the continuous predictors. My question is: How do I interpret the beta coefficients of the categorical predictors? I can interpret the other beta coefficients from the continuous predictors as "_% of the variance in x can be explained by y" but I am not sure how to interpret the beta from the categorical predictors. The t statistic for both of my dummy variables were significant, but I'm having difficulty interpreting it in the context of the full regression. Any help would be appreciated!

Thank you!

Best Answer

Your interpretation of the continuous predictors you have entered in the regression model seems to be somewhat mistaken. A more appropriate way to understand it would be "the expected increase/decrease in the dependent variable for one unit change in the independent variable". It appears that you have confused it somewhat with the interpretaiton of the R² of the total regression model. The interpretation of dummy variables follows the same principle. You can conceptualize it as the expected increase/decrease in the dependent variable for a change from 0 to 1 in the independent variable.

Imagine you have dummy coded a variable representing gender and for the sake of this example let Male=0 and Female=1. Let's say the dependent variable is time (in seconds) to complete a 100 m race. An unstandardized regression coefficient of +1.5 would suggest that if the independent variable is 1 (=female) an increase of 1.5 seconds in the time required to run 100 m is expected in comparison to males (condition male=0). Notice that what I said here relates to unstandardized regression coefficients; however, the discussion wouldn't differ as much for standardized regression coefficients.

In the context of a multiple regression the interpretation of a dummy independent variable wouldn't be different to what I just described, it's just that the regression coefficient should be interpreted under the assumption that you have controlled for the remaining independent variables in the model.

Related Solutions

Solved – Estimation process in OLS with categorical variables and dumthe coding

Just to answer one part of your question:

Now I am having difficulties understanding how this is done with categorical variables. I have read about dummy coding, and that a categorical variable with k levels is divided into k−1 dummy variables and so on, but I fail to see how this is actually implemented with regard to the actual OLS estimation (formulas above). How would the matrix of coefficients above look, if we are dealing with a categorical variable and dummy coding?

Hopefully this block of code will help. Look at the X matrix:

set.seed(123987)
n    <- 6
df   <- data.frame(x=runif(n), categorical=factor(letters[1:3]))
df$y <- rnorm(n) + df$x + ifelse(df$categorical == "a", 0,
                                 ifelse(df$categorical == "b", 2, 10))
fit  <- lm(y ~ x + categorical, data=df)
fit$coefficients  # Around -0.1, 2.5, 1.1 and 10.3

X      <- matrix(1, nrow=n, ncol=length(fit$coefficients))
X[, 2] <- df$x
X[, 3] <- 1*(df$categorical == "b")
X[, 4] <- 1*(df$categorical == "c")
colnames(X) <- c("constant", "x", "indicator for b", "indicator for c")  # Aka dummies
Y <- matrix(df$y, ncol=1)

beta_hat <- as.vector(solve(t(X) %*% X) %*% t(X) %*% Y)
max(abs(beta_hat - fit$coefficients))          # Very small -- essentially equal
isTRUE(all.equal(beta_hat, fit$coefficients))  # ...well, not equal enough for all.equal

The matrix X has one column of 1s (the constant); a column of df$x (a continuous predictor); a column that is 1 when the categorical variable equals "b", and zero otherwise; and similarly for "c". The value "a" is omitted, since we have a constant.

Edit: spacing in the code block is messed up for some reason, not sure why.

Solved – Interpreting effects of categorical and continuous predictors in multiple linear mixed models

Since @mhoven asked, here are the answers to my own old questions: I ended up following those (regarding the questions in the post):

Now I'm using the anova from lmerTest to describe the main effect of a multilevel categorical predictor;
To describe a direction, I change the contrasts. Then I do a correction for multiple comparisons.
Well, the estimate of the continues predictor changes with different contrasts of the categorical predictor, because it's estimated for the specific conditions, e.g., S-M, S-SM, M-SM. Of course it's different, as the model describes the effect of the continuous predictor for a certain pairwise comparison of the categorical levels.
Yes, as in 2. I normally just use Bonferonni.
I now report adjusted and marginal R^2 for the model. Hope that helps!

Best Answer

Related Solutions

Solved – Estimation process in OLS with categorical variables and dumthe coding

Solved – Interpreting effects of categorical and continuous predictors in multiple linear mixed models

Related Question