Regression – Interpreting Significance Codes in Linear Models with Factors Using R

linear modelrregression

I am setting up a linear model in R and need help understanding the significance codes when one of my independent variables is a factor – i.e., dummy variable for each possible value

For a scalar independent variable (e.g., age, income, height), it's straightforward – either the variable is significant in the model or it isn't. R tells you the p-value with nice little stars to code the different significance levels.

For a category/factor variable, like ethnicity or gender, what does it mean when some but not all of the dummy variables in the model have a small p-value?

Can you have an independent category variable that is significant only for individual categories? In the case of ethnicity, would it mean that ethnicity is only significant in the model when you're (specific value)?

I tagged the question with "R" although it's really a general question about interpreting p-values from a linear model.

Best Answer

Any p-value in a regression model is just a hypothesis test against the null that the estimated coefficient is zero (at some specified level). Non-significant results for some factor levels means that confidence intervals for those specific levels (e.g. "male" for a gender column, or "Australian" for a citizenship column) include zero (at some specified level).

It's not that the level as a whole is (in)significant, just that the coefficient for the specific subgroup is (not).