Solved – How to compare the relative effects between values of a categorical IV

multiple-comparisonsregressionstata

I am a social science phd student trying to figure out a statistical test for small conference presentation I am working on. However, I realized I don't know how to run the model I want. Some background on my question:

DV: Change in a person's weight (continuous variable)

IV: Which fruit someone ate for breakfast (categorical variable with four categories: Guava, raspberries, Pears, and strawberries)

Statistical Method: Simple OLS Regression

According to my theory, eating guavas should have the greatest effect on someone's weight compared to the other fruit, then raspberries, then pears, and with strawberries having the least effect on someone's weight.

To test my theory, my original plan was to run a simple OLS regression with 3 binary variables (one for each fruit, with strawberries being the baseline). However, I just realized that that sort of model would only tell me if each fruit had a greater effect than strawberries, but it would not tell me that the effect of guava was greater than raspberries, or if raspberries had a greater effect than pears, etc…

Is there a simple method in an OLS regression (in STATA) that would allow me to test if guava's had the greatest effect, raspberries had a second greatest effect, pears had the third greatest effect, and strawberries had the least effect?

Thanks!

FYI: No my paper is not really about fruit and weight loss. Those variables were made up for the purposes of this question.

Best Answer

You are in a rather special situation of one-sided testing. Assuming your dummy coding is such that guava is the reference category, you have $H_0: \beta_1 \le 0, \ldots, \beta_k \le 0$ vs. $H_1:$ at least one of $\beta_1, \ldots, \beta_k >0$. You can use the regular one-sided test, but it will be extremely conservative (more so with more dimensions). The proper distributions are non-standard: they are mixtures of $\chi^2$ or $F$-distributions with varying degrees of freedom. Stata mentions the conservativess of this test in a context where it most frequently arises: testing variance components in mixed models, see help j_xtmixedlr.

Even though the problem was first addressed more than fifty years ago (Chernoff (1954), Bartholomew (1961)), it is still a relatively esoteric piece of statistical theory. If you can handle Econometrica articles, Andrews (2001) will be very helpful. The textbook length treatment is given by Silvapulle & Sen (2004). I am not aware of a less technical discussion, though. Vika Savalei and I tried to retell the story for psychologists (Psych Methods 2008), you might find it somewhat more accessible. In particular, we discuss conditional inference that is easier to understand and apply.