Solved – Time series for count data, with counts < 20

count-dataepidemiologypoisson distributionrtime series

I recently started working for a tuberculosis clinic. We meet periodically to discuss the number of TB cases we're currently treating, the number of tests administered, etc. I'd like to start modeling these counts so that we're not just guessing whether something is unusual or not. Unfortunately, I've had very little training in time series, and most of my exposure has been to models for very continuous data (stock prices) or very large numbers of counts (influenza). But we deal with 0-18 cases per month (mean 6.68, median 7, var 12.3), which are distributed like this:

[image lost to the mists of time]

[image eaten by a grue]

I've found a few articles that address models like this, but I'd greatly appreciate hearing suggestions from you – both for approaches and for R packages that I could use to implement those approaches.

EDIT: mbq's answer has forced me to think more carefully about what I'm asking here; I got too hung-up on the monthly counts and lost the actual focus of the question. What I'd like to know is: does the (fairly visible) decline from, say, 2008 onward reflect a downward trend in the overall number of cases? It looks to me like the number of cases monthly from 2001-2007 reflects a stable process; maybe some seasonality, but overall stable. From 2008 through the present, it looks like that process is changing: the overall number of cases is declining, even though the monthly counts might wobble up and down due to randomness and seasonality. How can I test if there's a real change in the process? And if I can identify a decline, how could I use that trend and whatever seasonality there might be to estimate the number of cases we might see in the upcoming months?

Best Answer

To assess the historical trend, I'd use a gam with trend and seasonal components. For example

require(mgcv)
require(forecast)
x <- ts(rpois(100,1+sin(seq(0,3*pi,l=100))),f=12)
tt <- 1:100
season <- seasonaldummy(x)
fit <- gam(x ~ s(tt,k=5) + season, family="poisson")
plot(fit)

Then summary(fit) will give you a test of significance of the change in trend and the plot will give you some confidence intervals. The assumptions here are that the observations are independent and the conditional distribution is Poisson. Because the mean is allowed to change smoothly over time, these are not particularly strong assumptions.

To forecast is more difficult as you need to project the trend into the future. If you are willing to accept a linear extrapolation of the trend at the end of the data (which is certainly dodgy but probably ok for a few months), then use

fcast <- predict(fit,se.fit=TRUE,
               newdata=list(tt=101:112,season=seasonaldummyf(x,h=12)))

To see the forecasts on the same graph:

plot(x,xlim=c(0,10.5))
lines(ts(exp(fcast$fit),f=12,s=112/12),col=2)
lines(ts(exp(fcast$fit-2*fcast$se),f=12,s=112/12),col=2,lty=2)
lines(ts(exp(fcast$fit+2*fcast$se),f=12,s=112/12),col=2,lty=2)

You can spot the unusual months by looking for outliers in the (deviance) residuals of the fit.

Related Question