ANOVA – How to Implement a Two-Way ANOVA with Nesting in R or SPSS

anovarspssstatistical significancet-test

I currently am conducting a study where I have three variables: one that is binary, and two numerical ratio variables. Each of the subjects in my study has values for each of the three variables.

Variables:

Condition (binary): Values 0 and 1
Pre (ratio)
Post (ratio)

I want to test if there is a significant difference between the the pre and post variables of the 0 control group and the 1 experimental group. Both groups have 103 subjects. The data meet all typical ANOVA assumptions such as normality and the like. I was thinking of nesting the variables as follows and then running a two-way ANOVA.

Variable 1 is exposed / not and variable 2 is pre / post. People are nested in Variable 1 (meaning that each person gives both pre and post information for either exposed or not exposed conditions)

Would this be the correct way to approach this problem? Also how would I implement this statistical analysis, preferably in R or SPSS?

Best Answer

Pre-post designs are quite common and the standard method is to use the pre-measure as a control variable and the post-measure as the response with the binary variable as the covariate.

This is quite simple:

m0 <- lm(Post ~ Pre, data=dat)
m1 <- lm(Post ~ Pre+Condition, data=dat)
anova(m1,m0) # test for difference in mean(Post) after control for Pre
m2 <- lm(Post ~ Pre*Condition, data=dat) 
anova(m2, m1) # test for interaction: different slopes in the two Conditions

If this had been with repeated measures within subject (i.e, pre-post measures in each Condition, you would need to specify the error structure in the aov function: You could compare this model in R where the errors for each Condition are nested in subject-ID:

 m1 <- aov(post ~ pre + Condition+Error(subjID/Condition), data=dat)

Ratio variables are notorious for being poorly behaved statistically (i.e., often having significant skew and blowing up as the denominator goes near zero.) However, if the denominator is far from zero, you should not make the mistake of looking at the raw distributions and deciding that the data fail the assumptions needed for validity of linear methods. You need to look at the residuals before conducting any test. (And even then, there is some doubt whether minor departures from normality invalidate inferences.) The help page for aov warns you to use orthogonal contrasts. There was an article in R-News several years ago on multi-variate (as opposed to multi-variable) methods and this would provide further options for model comparisons and measures of sphericity.

Another option is offered by the Anova function in John Fox's car package.

A useful link: http://blog.gribblelab.org/2009/03/09/repeated-measures-anova-using-r/

Another approach is to analyze the pairwise differences, but that approach has flaws.

Related Solutions

Solved – How to analyse three way ANOVA that also has pre-post and one within subjects factor

From what you said, it seems like you have an AxBxCx(DxS) design. A-C are between subjects. D is pre/post (within subjects). In SPSS you would want a wide data format, where each subject would be a row in your data set.

subject expert guided disguised pre post

Then you would do a repeated-measures GLM, adding as a factor (call it "test" or whatever) with 2 levels, and name your response (below that, on the same dialog box) "score" or whatever.

On the next screen, select expert guided disguised and add those to "Between-subjects" factors. You should be off to the races then. This might help as a guide to the screens.

Solved – Dumthe variables in two-way ANOVA

In regression analysis without categorical variables, it's straight forward to include numeric predictors in a regression model. For example, if we wanted to predict, say, weights of children in a school as a linear function of age, we could specify the following model:

\begin{eqnarray*} Y_{i} & = & \beta_{0}+\beta_{age}X_{age,i}+\epsilon_{i} \end{eqnarray*}

where $Y_i$ is the weight of the $i$th child, $X_{age,i}$ is the age of the $i$th child, $\epsilon_i$ is a random error term associated with the $i$th child, $\beta_0$ is an intercept parameter and $\beta_{age}$ is the slope parameter associated with the variable age. A fitted model for this might look like this:

\begin{eqnarray*} E[Y] & = & 35+3X_{age} \end{eqnarray*}

What this model simply says, is that the expected weight for any student can be estimated by $35 + 3$ times the value of student's age. So if the student's age was $7$, then the expected weight of this student would be $35 + 3\times 7 = 56$.

Now, let's say, instead of using age to predict weight, we were interested in predicting a student's weight based on his race. How could this be represented in mathematical terms? After all, we can't multiply categories of race with estimated regression coefficients. For example, how would this function make any sense if a student were black: $E[Y]=35 + 3 \times Black$, since it doesn't make sense to multiply "3 times black" or "3 times white" or any category for that matter?

Dummy coding is a way to handle this. Dummy variables are a simple way to "code" (or map or translate) categorical information or categorical representations in our dataset so that categorical groups can be represented in mathematical terms in a regression model. They also facilitate interpretation. If we have a categorical variable, for example, "Race," then we may have several different categories/levels of this variable, say, Black, White, and Asian. How can we create a regression model and include race as a predictor similar to the way we worked with age?

Well, it turns out, that with dummy variables, we create new variables (the dummy variables) that are coded either zero (0) or one (1) to represent the categories. Generally speaking when $c$ categories are present, we will need $c-1$ dummy variables. In the race example, we have three race categories, so $c=3$. This means we will need $c-1=3-1=2$ dummy variables to represent the $3$ racial categories in our regression model. We'll call the new variables $X_{black,i}$ and $X_{white,i}$ (if you are wondering where the Asian category went, hold tight: I'll explain shortly). Then we'll code the information in our dataset as follows:

\begin{eqnarray*} X_{black,i} & = & \begin{cases} 1 & \text{if the $i$th student is black}\\ 0 & \text{otherwise} \end{cases} \end{eqnarray*}

and

\begin{eqnarray*} X_{white,i} & = & \begin{cases} 1 & \text{if the $i$th student is white}\\ 0 & \text{otherwise} \end{cases} \end{eqnarray*}

Using this dummy coding, a regression model for weight, $Y_i$ based on this dummy would be:

\begin{eqnarray*} Y_{i} & = & \beta_{0}+\beta_{black}X_{black_i}+\beta_{white}X_{white,i}+\epsilon_i \end{eqnarray*}

and the corresponding response function might be something like:

\begin{eqnarray*} E[Y] & = & 35+5X_{black}+3X_{white} \end{eqnarray*}

where $\hat{\beta}_0=35$, $\hat{\beta}_{black}=5$, and $\hat{\beta}_{white}=3$. To interpret this model, it's instructive to write out the model that would be estimated for a black student. When a student is black, $X_{black}=1$ and $X_{white}=0$, so the response function becomes:

\begin{eqnarray*} E[Y] & = & 35+5\times1+3\times0=35+5=40 \end{eqnarray*}

Now, when a student is white, $X_{black}=0$ and $X_{white}=1$, so the response function becomes:

\begin{eqnarray*} E[Y] & = & 35+5\times0+3\times1=35+3=38 \end{eqnarray*}

If the student is Asian, then both $X_{black}=0$ and $X_{white}=0$, and the response functions just becomes:

\begin{eqnarray*} E[Y] & = & 35+5\times0+3\times0=35+0+0=35 \end{eqnarray*}

As you can see the Asian category is represented by just the intercept in our model, so we don't need any $X$-value coded $1$ to represent it. By coding $X_{white,i}=0$ and $X_{black,i}=0$, we are representing the Asian racial category.

So, with the dummy coding, a black student is expected to have a mean weight of $40$ bounds, a white student a mean weight of $38$ bounds, and an Asian student a mean weight of only $35$ pounds.

As you can see dummy coding allow the regression model to change depending on the categories you are trying to predict. If you'd like to see some additional examples of how dummy coding works, this website has some excellent examples and explanations.

Lastly, it should be noted that you can use this type of coding universally in regression modeling, so it can be used with ANOVA models, mixed models, $3\times3$ factorial models, etc.

Best Answer

Related Solutions

Solved – How to analyse three way ANOVA that also has pre-post and one within subjects factor

Solved – Dumthe variables in two-way ANOVA

Related Question