Solved – Using ANOVA and t-tests with pre-aggregated data

anovarstatistical significance

I have a set of experiments that I'd like to run some significance tests over. It looks like:

Word Feature Model Test Score Correct Total
allow.v woc hac zellig 0.382353 26 68
allow.v woc hac bellig 1.000000 0 0
allow.v woc kmeans zellig 0.382353 26 68
allow.v woc kmeans bellig 1.000000 0 0
run.v woc eigen zellig 0.308824 21 68
run.v woc eigen bellig 1.000000 0 0
run.v woc agglo zellig 0.323529 22 68

This combines two different types of tests, which i'm calling zellig and bellig for a lack of better names right now. Using this data, I considered treating each score as a response variable that is dependent on the Feature and Model, with each as an observation. With this format, it's pretty easy to do standard evaluations like:

zelligData <- subset(senseData, Test == "zellig")
anova(lm(Score ~ Feature * Model, zelligData))
pairwise.t.test(zelligData$Score, zelligData$Feature, p.adj="none")
pairwise.t.test(zelligData$Score, zelligData$Model, p.adj="none")

That's well and good, however, Score is actually an aggregate value of many observed events, it's the average number of times each combination resulted a correct evaluation. The Correct and Total fields count the two numbers used to generate Score. And now, I'd like to do significance tests for each word with each event counted by Total treated as an observation. Concretely, I'd like to use the above commands on data that looks like:

Word Feature Model Test Score
allow.v woc hac zellig 1
allow.v woc hac zellig 1
allow.v woc hac zellig 1
allow.v woc hac zellig 1
....
allow.v woc hac zellig 0
allow.v woc hac zellig 0
allow.v woc hac zellig 0
allow.v.pos hac zellig 1
allow.v.pos hac zellig 1
allow.v.pos hac zellig 1
...
allow.v.pos hac zellig 0
allow.v.pos hac zellig 0
allow.v.pos hac zellig 0
...

However, I'd like to avoid creating this secondary view of the data, as some words have a very large number of events and it seems wasteful to turn the aggregate values into the expanded form.

Is it at all possible to perform anova and paired t-test using my already aggregated counts of events? Would this evaluation be different from just using the aggregated scores as they are? I imagine they would as the aggregated Score value doesn't indicate how many times events there really were in total.

Thanks in advance!

Edit: After reading how to do ANOVA manually, I should rephrase this question. Is there anyway to pass in the matrix of summary statistics between each factor, as described in Section 2 (Summary Statistics) of the instructions for doing it manually, to a function in R and have it compute the ANOVA result?

Best Answer

I think you've answered your own question quite well. By aggregating your data, you are discarding information about the frequency of the counts. Thus, in your model, 500 out of 1000 correct evaluations is the same as 1 out of 2. This can significantly alter your interpretation of the results. In each case, given no other information, we would prefer a point estimate of .5 for proportion of correct evaluations. However, we can be more confident in the first case that the correct evaluation rate is close to .5. In the second, we are much less confident that this did not occur simply by chance. If, as I think it is, it is this underlying proportion of correct evaluations that is of interest to you, you should use the full data set.

Related Question