Solved – Casting multidimensional data in R into a data frame

r

I'm trying to do some percentage based comparisons across different groups in a criminal sentencing data set (http://dl.dropbox.com/u/1156404/wightCrimRecords.csv)

I have a data in an array of the form:

    $Female
    x
                             Burglary                 Criminal Damage and Arson 
                          0.004950495                               0.017326733 
                     Driving Offences                                    Murder 
                          0.371287129                               0.000000000
    $Male
    x
                             Burglary                 Criminal Damage and Arson 
                          0.013001083                               0.058504875 
                     Driving Offences                                    Murder 
                          0.303358613                               0.000000000

    $`Not Stated`
    x
                             Burglary                 Criminal Damage and Arson 
                            0.0000000                                 0.0000000 
                     Driving Offences                                    Murder 
                            0.1111111                                 0.0000000 

This was derived from the original data as follows:

iw=read.csv("~/data/recordlevel.csv")
iwp=tapply(iw$Offence_type,iw$AGE,function(x){prop.table(table(x))})

What I would like to do is generate a single data frame that contains a gender column, a frequency column, and rows corresponding to Burglary, Murder etc.

I can extract a single datatable from the multidimensional array, eg using:

iwpF =data.frame(iwp['Female'])

which generates a separate row for each offence and columns referring to offence type and frequency, but can't see how to generate a single datatable.

PS I was also wondering whether it's possible to pull out even more structured data, that for example counts the percentages of offence type sex and age group, so for example I could lookup up what percentage of convictions for males in the 35+ age range are related to murder.

Best Answer

I'm not sure I followed the PS part of your question, but maybe this will get you on the right path. The trick is to use melt() to get the data into long format, then use ddply() to group by:

library(plyr)
library(reshape2)
iw <- read.csv("http://dl.dropbox.com/u/1156404/wightCrimRecords.csv")
iw.m <- melt(iw, id.vars = "sex", measure.vars = "Offence_type")
ddply(iw.m, "sex", function(x) as.data.frame(prop.table(table(x$value))))

Gives us:

          sex                                      Var1        Freq
1      Female                                  Burglary 0.004950495
2      Female                 Criminal Damage and Arson 0.017326733
3      Female                          Driving Offences 0.371287129
...
50      Other                           Supply of drugs 0.000000000
51      Other                             Vehicle Crime 0.000000000
52      Other                             Violent Crime 0.000000000

EDIT - after reading the PS again, I think this is what you had in mind:

iw.m <- melt(iw, id.vars = c("sex", "AGE"), measure.vars = "Offence_type")
ddply(iw.m, c("sex", "AGE"), function(x) as.data.frame(prop.table(table(x$value))))

           sex   AGE                                      Var1        Freq
1       Female 18-24                                  Burglary 0.011764706
2       Female 18-24                 Criminal Damage and Arson 0.047058824
3       Female 18-24                          Driving Offences 0.188235294
....

You can obviously continue to add ID variables which then get passed into plyr to group on to any level of detail that is sufficient for your purposes.

Related Question