Solved – How to interpret a coefficient of a dumthe variable in regards to several categorical variables

categorical dataregression

Let's assume we have a regression model with variable age and two categorical variables: education and gender.

1st categorical variable

woman
man

2nd categorical variable

no qualification
higher intermediate
graduate or more

Income = age + woman + higher-intermediate + graduate-or-more

How to interpret the coefficient for women? Is it the income difference between a woman and a man both with no qualification or is it the income difference between the woman compared to the man regardless of their education, if 1) how do I measure the latter?

Best Answer

When you have a regression model with one or more categorical variables, there is a level of each one of those variables that is taken as the reference level, and the model is adjusted taking into account these reference levels (for example, level "man" on your gender variable).

Then, you'll have to interpret it as follows: when gender is "man", the coefficient associated to "woman" won't have any effect on the response variable (you can think it as "woman" is 0). When gender is "woman", these variable is interpreted as 1, so the response variable will be affected by the asociated coefficient. So, if the "woman" coefficient is positive, this model is saying that womans have a higher incomes on average, and if it is negative, just the other way around.

The same happens with your education variable, but in this case, it has three levels. "no qualification" is the reference level, and you should use the coefficients of "higher-intermediate" or "graduate-or-more" only when you are trying to predict the response for people with these features.

Best Answer

Related Solutions

Solved – Understanding Simpson’s paradox: Andrew Gelman’s example with regressing income on sex and height

Solved – How to write down a logistic regression formula for continuous and categorical variables

Related Question