Solved – How to calculate the coefficient of a dumthe variable reference category

categorical datacategorical-encodingmultiple regressionregressionregression coefficients

I am currently building a regression model with numerous continuous, categorical (employing dummies) and interaction variables. I understand we must use k-1 dummies with one variable becoming the reference category and then the impact of the other dummies can be reported on relative to this reference category.

I would however like to specifically identify the coefficient for the reference category. In this instance the reference dummy will be the UK with the dependant being fund returns. I have 6 country dummies and at present I can only say (for example) German funds underperformed the UK by 3%. Could any of you please advise me on how to ascertain the specific coefficient value for a reference dummy variable so I may be able to say UK funds performed x% throughout the period?

Best Answer

If the response variable is in the units in which you are interested (annual performance percentage), then what you really want to do is to take advantage of the intercept in your regression model. As one comment notes, the regression coefficient for UK per se is 0, by construction, if that is the reference category.

With the treatment contrasts that you seem to be using to do your analysis (comparing all levels of each categorical predictor against a reference category), that intercept will be the value of the response variable when all predictor categories are at their reference level and all continuous predictors are at 0. In particular, it will represent that situation specifically for UK funds. (The 0 coefficient for UK means you add 0 to the intercept to get the value for UK.)

You can then use the regression coefficients to add in the contributions from all the other predictors to get the response value for UK under other combinations of predictor values. For error estimates you incorporate information from the covariance matrix of the regression coefficients, using the formula for the variance of a sum of correlated variables.

This assumes, however, that there is no interaction term involving your categorical variable country. If there is, then your interpretation of the 3% coefficient for Germany is incomplete: it represents the difference between Germany and UK only at the reference values of all other categorical variables and at 0 values of all continuous variables. You must also add in the contributions of all the interaction terms to compare Germany and UK in any other scenario.

Related Question