Solved – Help with Anova of categorical and continuous variable in R and SPSS output

I am having some trouble running an Anova on categorical variables in R and matching SPSS output. What I need to do is run an anova on the dataset below (its a made up data set). But, I need to know if the mean of each category is significantly from the total mean of all races.

Satisfaction    Race
3   Asian
4   Cacasion
5   African American
2   Other 
5   African American
3   African American
4   African American
5   African American
2   Asian
3   African American
1   Cacasion
1   Cacasion
1   Cacasion
5   Other 
5   Other 
5   Other 
5   African American
5   Asian
4   Asian
5   Other 
5   Other 
5   Other 
1   Cacasion
4   Cacasion

For example, the mean of all races is 3.5 :

> mean(test$Satisfaction)
[1] 3.5

What I would like to know is if the mean score for each race is significantly different from the total mean of 3.5 and the p-value.

I ran an Anova in R with the following model, but R will set one catagory as the refernce and test is against the others :

> lm.test <- lm(test$Satisfaction ~ test$Race)
> summary(lm.test)

Call:
lm(formula = test$Satisfaction ~ test$Race)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5714 -1.0000  0.4286  0.8482  2.0000 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)         3.8571     0.5023   7.679 2.18e-07 ***
test$RaceAsian     -0.6071     0.8330  -0.729   0.4745    
    test$RaceCacasion  -1.8571     0.7394  -2.512   0.0207 *  
test$RaceOther      0.7143     0.7103   1.006   0.3266    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.329 on 20 degrees of freedom
Multiple R-squared: 0.391,  Adjusted R-squared: 0.2997 
F-statistic:  4.28 on 3 and 20 DF,  p-value: 0.01732

The output is telling me that the mean for African American is 3.8571 and is significantly different from the mean of the caucasian group. It is not different from the mean of group Asian and Other.

Is there a way for me to set the intercept to 3.5 in R and get significant compared to the mean and not the reference group. Or should I be using another tests altogether? My stats isn't that great so if its another tests a brief explain on which test and how to run it in R would be great.

Best Answer

I think that the easiest is to center your dependent variable around the grand mean. Given your example:

test$Satisfaction <- scale(test$Satisfaction, center=TRUE)

This way, the grand mean is now 0, and the mean for each ethnic group is the deviation from the grand mean. Then you run your regression as usual, but the four tests that you get are whether each ethnic group's mean differs from the grand mean, because those are tests of whether the mean differs from 0, which is exactly the grand mean after you've centered your dependent variable.

If you retain the intercept in the model (as you did in your example), then the significance test of the intercept is whether the mean of the reference group is significantly different from the grand mean. If you suppress the intercept by using:

lm(test$Satisfaction ~ 0 + test$Race)

then you get exactly the same results (barring some difference on the adjusted R²), but instead of having an intercept, you get the label for your 4th ethnic group, the one that used to be the reference category. (See here for more information on R² calculations when the intercept is removed from the model.)

Mean-centering your DV and re-running your regression is probably your best option. Alternatively, you could compute separate 1-sample t-tests for each ethnic group, comparing the ethnic groups means to the grand mean, e.g.:

t.test(subset(test, Race=="Asian")$Satisfaction, mu=mean(test$Satisfaction))

However, this is a less powerful approach, since both the degrees of freedom and the standard error will be computed based on only one group instead of your whole sample. Therefore, your best bet is to re-run your regression, but with your dependent variable mean-centered.