Solved – How to cross-classify two categorical variables

categorical dataclassificationr

I have a data frame with two categorical variables (representing different methods) that "classify" objects into three categories: cat1, cat2, cat3. The objects are classified somewhat differently by the two variables and I would like to perform a test described as "Measuring Agreement Between Observers" in Agresti's CDA, Section 10.5 (after Thompson 2009, p. 192) to check if both methods perform differently.

Wanting to follow the example in package exactLoglinTest, I struggle to create a cross-classification data frame similar to library(exactLoglinTest); data(pathologist.dat)

My variables are in the following form:

classification.1    classification.2
cat1                cat1
cat1                cat2
cat3                cat3
cat2                cat2
cat2                cat3

…etc

I have more than thousand classified observations and (as far as I understand) I would like to count all the combinations – i.e. cat1 & cat1, cat1 & cat2, cat1 & cat3, etc.

How can I achieve this?

References: Thompson, L.A. 2009. R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002) 2nd edition. available online

Best Answer

fakeData <- data.frame(var1 = sample(letters[1:3], 1000, TRUE)
                       , var2 = sample(letters[1:3], 1000, TRUE))

yourTable <- with(fakeData, table(var1, var2))

#Option 1
as.data.frame(yourTable)
#Option 2
require(reshape2)
melt(yourTable)

Gives you something like:

  var1 var2 Freq
1    a    a  107
2    b    a  116
3    c    a  119
4    a    b  109
5    b    b  109
6    c    b  106
7    a    c  100
8    b    c  121
9    c    c  113
Related Question