I have a database where experiments are stored. Each experiment has a flag, which says that it was properly registered. I am trying to show with a statistical test, that there is a trend in time that experients are increasingly properly registered (ratio column; rising from 56% in 2006 to 72% in 2011)
> print(ascii(c,include.rownames=F,format='nice'),type='org')
| year | all_experiments | properly registered | ratio |
|------+-----------------+---------------------+-------|
| 2006 | 6431 | 3604 | 56.04 |
| 2007 | 7013 | 3990 | 56.89 |
| 2008 | 8285 | 4899 | 59.13 |
| 2009 | 7523 | 5063 | 67.3 |
| 2010 | 7296 | 5210 | 71.41 |
| 2011 | 7243 | 5243 | 72.39 |
What statistical approach should I use to prove it.
- Do I do take just the percentages, plot them and show that I can fit a line through the data and point to the slope of that line: how do I do that? Is it enough?
- Is there any more complicated test which takes into into account the numbers behind the percentages, and how would I do it?
I also need to compute the analysis within R so a generic help is useful but also hints to what commands to use within R would be best.
The data partitioned by year come from the same dataset and were produced like this:
a = ddply(s1b,.(format(First.Received, "%Y")) ,.fun=function(x) nrow(x))
b = ddply(s1b,.(format(First.Received, "%Y")) ,.fun=function(x) nrow(subset(x,properly_registered==1)))
...
Best Answer
This is the sort of problem that the Cochran-Armitage test for trend was designed to solve.
These days, though, most statisticians would probably use logistic regression instead, with
year
(or perhapsyear-2000
to avoid any risk of numerical issues) treated as a continuous explanatory variable.