What about a Poisson regression?
I created a data frame containing your data, plus an index t
for the time (in months) and a variable monthdays
for the number of days in each month.
T <- read.table("suicide.txt", header=TRUE)
U <- data.frame( year = as.numeric(rep(rownames(T),each=12)),
month = rep(colnames(T),nrow(T)),
t = seq(0, length = nrow(T)*ncol(T)),
suicides = as.vector(t(T)))
U$monthdays <- c(31,28,31,30,31,30,31,31,30,31,30,31)
U$monthdays[ !(U$year %% 4) & U$month == "Feb" ] <- 29
So it looks like this:
> head(U,14)
year month t suicides monthdays
1 1995 Jan 0 62 31
2 1995 Feb 1 47 28
3 1995 Mar 2 55 31
4 1995 Apr 3 74 30
5 1995 May 4 71 31
6 1995 Jun 5 70 30
7 1995 Jul 6 67 31
8 1995 Aug 7 69 31
9 1995 Sep 8 61 30
10 1995 Oct 9 76 31
11 1995 Nov 10 68 30
12 1995 Dec 11 68 31
13 1996 Jan 12 64 31
14 1996 Feb 13 69 29
Now let’s compare a model with a time effect and a number of days effect with a model in which we add a month effect:
> a0 <- glm( suicides ~ t + monthdays, family="poisson", data = U )
> a1 <- glm( suicides ~ t + monthdays + month, family="poisson", data = U )
Here is the summary of the "small" model:
> summary(a0)
Call:
glm(formula = suicides ~ t + monthdays, family = "poisson", data = U)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7163 -0.6865 -0.1161 0.6363 3.2104
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.8060135 0.3259116 8.610 < 2e-16 ***
t 0.0013650 0.0001443 9.461 < 2e-16 ***
monthdays 0.0418509 0.0106874 3.916 9.01e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 302.67 on 203 degrees of freedom
Residual deviance: 196.64 on 201 degrees of freedom
AIC: 1437.2
Number of Fisher Scoring iterations: 4
You can see that the two variables have largely significant marginal effects. Now look at the larger model:
> summary(a1)
Call:
glm(formula = suicides ~ t + monthdays + month, family = "poisson",
data = U)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.56164 -0.72363 -0.05581 0.58897 3.09423
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.4559200 2.1586699 0.674 0.500
t 0.0013810 0.0001446 9.550 <2e-16 ***
monthdays 0.0869293 0.0719304 1.209 0.227
monthAug -0.0845759 0.0832327 -1.016 0.310
monthDec -0.1094669 0.0833577 -1.313 0.189
monthFeb 0.0657800 0.1331944 0.494 0.621
monthJan -0.0372652 0.0830087 -0.449 0.653
monthJul -0.0125179 0.0828694 -0.151 0.880
monthJun 0.0452746 0.0414287 1.093 0.274
monthMar -0.0638177 0.0831378 -0.768 0.443
monthMay -0.0146418 0.0828840 -0.177 0.860
monthNov -0.0381897 0.0422365 -0.904 0.366
monthOct -0.0463416 0.0830329 -0.558 0.577
monthSep 0.0070567 0.0417829 0.169 0.866
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 302.67 on 203 degrees of freedom
Residual deviance: 182.72 on 190 degrees of freedom
AIC: 1445.3
Number of Fisher Scoring iterations: 4
Well, of course the monthdays
effect vanishes; it can be estimated only thanks to leap years!! Keeping it in the model (and taking into account leap years) allows to use the residual deviances to compare the two models.
> anova(a0, a1, test="Chisq")
Analysis of Deviance Table
Model 1: suicides ~ t + monthdays
Model 2: suicides ~ t + monthdays + month
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 201 196.65
2 190 182.72 11 13.928 0.237
So, no (significant) month effect? But what about a seasonal effect? We can try to capture seasonality using two variables $\cos\left( {2\pi t \over 12}\right)$ and $\sin\left( {2\pi t \over 12}\right)$:
> a2 <- glm( suicides ~ t + monthdays + cos(2*pi*t/12) + sin(2*pi*t/12),
family="poisson", data = U )
> summary(a2)
Call:
glm(formula = suicides ~ t + monthdays + cos(2 * pi * t/12) +
sin(2 * pi * t/12), family = "poisson", data = U)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4782 -0.7095 -0.0544 0.6471 3.2236
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.8676170 0.3381954 8.479 < 2e-16 ***
t 0.0013711 0.0001444 9.493 < 2e-16 ***
monthdays 0.0397990 0.0110877 3.589 0.000331 ***
cos(2 * pi * t/12) -0.0245589 0.0122658 -2.002 0.045261 *
sin(2 * pi * t/12) 0.0172967 0.0121591 1.423 0.154874
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 302.67 on 203 degrees of freedom
Residual deviance: 190.37 on 199 degrees of freedom
AIC: 1434.9
Number of Fisher Scoring iterations: 4
Now compare it with the null model:
> anova(a0, a2, test="Chisq")
Analysis of Deviance Table
Model 1: suicides ~ t + monthdays
Model 2: suicides ~ t + monthdays + cos(2 * pi * t/12) + sin(2 * pi *
t/12)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 201 196.65
2 199 190.38 2 6.2698 0.0435 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So, one can surely say that this suggests a seasonal effect...
Best Answer
To assess the historical trend, I'd use a gam with trend and seasonal components. For example
Then
summary(fit)
will give you a test of significance of the change in trend and the plot will give you some confidence intervals. The assumptions here are that the observations are independent and the conditional distribution is Poisson. Because the mean is allowed to change smoothly over time, these are not particularly strong assumptions.To forecast is more difficult as you need to project the trend into the future. If you are willing to accept a linear extrapolation of the trend at the end of the data (which is certainly dodgy but probably ok for a few months), then use
To see the forecasts on the same graph:
You can spot the unusual months by looking for outliers in the (deviance) residuals of the fit.