Solved – Coding categorical variables for regression

categorical datacategorical-encodingmultiple regression

I'm not sure of the best way to code my categorical predictor variable for use in a hierarchical regression in order to test my specific hypothesis. This categorical variable has 3 levels representing 3 groups. I want to compare group 1 to group 2, group 1 to group 3 and group 2 to group 3. I know that for dummy coding I create k-1 variables, so 2 dummy variables in my case and code these variables with 0s and 1s while choosing one level of the categorical variable to be a reference category.

However, I'm not sure this is the best way of making the comparisons I wish to make as it appears I could only compare each group to the reference category, am I correct? So if group 3 was the reference category I could compare group 1 to group 3 and group 2 to group 3 but I could not compare group 1 to group 2. What alternative method of coding should I use to make these comparisons? My regression model will also contain continuous variables. I'm an undergrad psychology student and statistics are not my strong point simple answers would be best for me. I use SPSS. Thank you!

Best Answer

Here is an example using the employee data.sav data, which comes with standard installation. Suppose salary is the dependent variable, job category, jobcat, is the categorical independent variable, and beginning salary, salbegin, is the continuous independent variable. Using GLM, you can perform pairwise comparisons between each pair of job categories. The steps are as follow:

  1. With the data set open, go to Analyze > General Linear Model > Univariate. enter image description here

  2. Put the dependent variable and independent variable into the correct slots. Categorical independent variables go to "Fixed Factor(s)" and continuous ones go to "Covariate(s)." Do not worry about the Random Factors. When it's all set, click the "Model" button. enter image description here

  3. In the Model panel, highlight the two independent variables, then change the build term to "Main effects," and then click the arrow button (indicated by the red circle) to bring the two variables over. When all set, click "Continue." enter image description here

  4. Now, click the "Option" button. enter image description here

  5. In the Option panel, do the followings: 1) Highlight jobcat, 2) bring it over to the right by clicking the arrow button, 3) Check "Compare Main Effects", 4) Specify the adjustment you'd like to make for the multiple pairwise comparisons. I left it as LSD which does not adjust for multiple tests, 5) Check "Parameter Estimates" so that you'll also get the regression coefficients. When it's all done, click Continue and then OK to submit the test. enter image description here

  6. Here is the regression coefficient table: enter image description here

  7. Scroll down a bit and you'll find the pairwise comparisons table: enter image description here