Solved – Using ANOVA and t-tests with pre-aggregated data

anovarstatistical significance

I have a set of experiments that I'd like to run some significance tests over. It looks like:

Word Feature Model Test Score Correct Total
allow.v woc hac zellig 0.382353 26 68
allow.v woc hac bellig 1.000000 0 0
allow.v woc kmeans zellig 0.382353 26 68
allow.v woc kmeans bellig 1.000000 0 0
run.v woc eigen zellig 0.308824 21 68
run.v woc eigen bellig 1.000000 0 0
run.v woc agglo zellig 0.323529 22 68

This combines two different types of tests, which i'm calling zellig and bellig for a lack of better names right now. Using this data, I considered treating each score as a response variable that is dependent on the Feature and Model, with each as an observation. With this format, it's pretty easy to do standard evaluations like:

zelligData <- subset(senseData, Test == "zellig")
anova(lm(Score ~ Feature * Model, zelligData))
pairwise.t.test(zelligData$Score, zelligData$Feature, p.adj="none")
pairwise.t.test(zelligData$Score, zelligData$Model, p.adj="none")

That's well and good, however, Score is actually an aggregate value of many observed events, it's the average number of times each combination resulted a correct evaluation. The Correct and Total fields count the two numbers used to generate Score. And now, I'd like to do significance tests for each word with each event counted by Total treated as an observation. Concretely, I'd like to use the above commands on data that looks like:

Word Feature Model Test Score
allow.v woc hac zellig 1
allow.v woc hac zellig 1
allow.v woc hac zellig 1
allow.v woc hac zellig 1
....
allow.v woc hac zellig 0
allow.v woc hac zellig 0
allow.v woc hac zellig 0
allow.v.pos hac zellig 1
allow.v.pos hac zellig 1
allow.v.pos hac zellig 1
...
allow.v.pos hac zellig 0
allow.v.pos hac zellig 0
allow.v.pos hac zellig 0
...

However, I'd like to avoid creating this secondary view of the data, as some words have a very large number of events and it seems wasteful to turn the aggregate values into the expanded form.

Is it at all possible to perform anova and paired t-test using my already aggregated counts of events? Would this evaluation be different from just using the aggregated scores as they are? I imagine they would as the aggregated Score value doesn't indicate how many times events there really were in total.

Thanks in advance!

Edit: After reading how to do ANOVA manually, I should rephrase this question. Is there anyway to pass in the matrix of summary statistics between each factor, as described in Section 2 (Summary Statistics) of the instructions for doing it manually, to a function in R and have it compute the ANOVA result?

Best Answer

I think you've answered your own question quite well. By aggregating your data, you are discarding information about the frequency of the counts. Thus, in your model, 500 out of 1000 correct evaluations is the same as 1 out of 2. This can significantly alter your interpretation of the results. In each case, given no other information, we would prefer a point estimate of .5 for proportion of correct evaluations. However, we can be more confident in the first case that the correct evaluation rate is close to .5. In the second, we are much less confident that this did not occur simply by chance. If, as I think it is, it is this underlying proportion of correct evaluations that is of interest to you, you should use the full data set.

Related Solutions

Solved – Statistical power and minimum sample size for ANOVA with likert scale as dependent variable

The commonly used statistical methods assume that you take a sample of an infinite or very large population. ANOVA, too, has this assumption. When the subjects of your survey can be viewed as a representative sample of an existing or hypothetical much larger population, you do not need the finite population methods.

The second question is if ANOVA is appropriate to analyse the data collected. 7 point Likert scales are strictly speaking ordinal scales, so methods for ordinal dependent variables may be best. However, in psychometry it's usual to assume that the values from a Likert scale will follow a distribution that may be approximated with a normal distribution. In this case ANOVA is an acceptable method; the t-test too, although the latter compares two groups only. (The methods designed for binary (yes/no) outcomes may be used after setting a threshold in your Likert scale and dichotomising your data, however unless this threshold also exists in the psychological mechanism it will lead to loss of detail in your data and loss of power in your test. So not generally recommended.)

You need to check or think over if the homoscedasticity assumption of ANOVA is likely to be met. If yes, use a power formula for ANOVA and you need not worry about not having to specify the population size.

Solved – Help with Anova of categorical and continuous variable in R and SPSS output

I think that the easiest is to center your dependent variable around the grand mean. Given your example:

test$Satisfaction <- scale(test$Satisfaction, center=TRUE)

This way, the grand mean is now 0, and the mean for each ethnic group is the deviation from the grand mean. Then you run your regression as usual, but the four tests that you get are whether each ethnic group's mean differs from the grand mean, because those are tests of whether the mean differs from 0, which is exactly the grand mean after you've centered your dependent variable.

If you retain the intercept in the model (as you did in your example), then the significance test of the intercept is whether the mean of the reference group is significantly different from the grand mean. If you suppress the intercept by using:

lm(test$Satisfaction ~ 0 + test$Race)

then you get exactly the same results (barring some difference on the adjusted R²), but instead of having an intercept, you get the label for your 4th ethnic group, the one that used to be the reference category. (See here for more information on R² calculations when the intercept is removed from the model.)

Mean-centering your DV and re-running your regression is probably your best option. Alternatively, you could compute separate 1-sample t-tests for each ethnic group, comparing the ethnic groups means to the grand mean, e.g.:

t.test(subset(test, Race=="Asian")$Satisfaction, mu=mean(test$Satisfaction))

However, this is a less powerful approach, since both the degrees of freedom and the standard error will be computed based on only one group instead of your whole sample. Therefore, your best bet is to re-run your regression, but with your dependent variable mean-centered.

Best Answer

Related Solutions

Solved – Statistical power and minimum sample size for ANOVA with likert scale as dependent variable

Solved – Help with Anova of categorical and continuous variable in R and SPSS output

Related Question