Solved – What test can I use to prove there is an upward trend in time series of ratios assesment in R

hypothesis testingrtime series

I have a database where experiments are stored. Each experiment has a flag, which says that it was properly registered. I am trying to show with a statistical test, that there is a trend in time that experients are increasingly properly registered (ratio column; rising from 56% in 2006 to 72% in 2011)

> print(ascii(c,include.rownames=F,format='nice'),type='org')
| year | all_experiments | properly registered | ratio |
|------+-----------------+---------------------+-------|
| 2006 | 6431            | 3604                | 56.04 |
| 2007 | 7013            | 3990                | 56.89 |
| 2008 | 8285            | 4899                | 59.13 |
| 2009 | 7523            | 5063                | 67.3  |
| 2010 | 7296            | 5210                | 71.41 |
| 2011 | 7243            | 5243                | 72.39 |

What statistical approach should I use to prove it.

  1. Do I do take just the percentages, plot them and show that I can fit a line through the data and point to the slope of that line: how do I do that? Is it enough?
  2. Is there any more complicated test which takes into into account the numbers behind the percentages, and how would I do it?

I also need to compute the analysis within R so a generic help is useful but also hints to what commands to use within R would be best.

The data partitioned by year come from the same dataset and were produced like this:

a = ddply(s1b,.(format(First.Received, "%Y")) ,.fun=function(x) nrow(x))
b = ddply(s1b,.(format(First.Received, "%Y")) ,.fun=function(x) nrow(subset(x,properly_registered==1)))
...

Best Answer

This is the sort of problem that the Cochran-Armitage test for trend was designed to solve.

These days, though, most statisticians would probably use logistic regression instead, with year (or perhaps year-2000 to avoid any risk of numerical issues) treated as a continuous explanatory variable.