Solved – Difference between temporal trends

count-datatrend

I have a dataset with observations form a population that have been described graphically as an annual trend. For example – male infection rate per year, female infection rate per year.

The infections are aggregated count data and there are population based numerator data available.

Which statistical test would I use to determine if the male infection temporal trend is different than the female infection temporal trend.

Thanks.

Best Answer

Let's start with some considerations:

  • One usually begins with simple reasonable models, as suggested by theory and restricted by data limitations, and moves to more complex models only if the simpler ones are inadequate. This is how statistical analysis operationalizes the scientific call for parsimony.

  • Fitting a trend is a form of regression analysis.

  • Because you have count data, you would naturally first consider binomial regression or Poisson regression. The first is appropriate in any case, while the latter is an excellent approximation for relatively low rates (which is what one hopes with infections!) and is widely available in software. (Ordinary least squares (OLS) is a further approximation that would be valid provided all the annual infection counts are fairly large, say in the tens to hundreds or more, and the infection counts are fairly constant over time.)

  • When a longish time series of data is available (usually 20-30+ years), you can consider using time series analysis to help account for correlations in rates from year to year. Usually, though, you would first exhaust plausible regression models to account for nonlinear changes over time, perhaps by including quadratic terms, "level shifts," or (more generally) splines. Note that the flexibility to model changes in slope over time is built in to all forms of regression; it is not a special feature of some particular approach.

In any of the regression models you can include separate terms for the male and female trends. This is done by introducing male/female as a covariate by means of "indicator" or "dummy variable" coding and including them as interactions. This has recently been discussed on this site here and here, where you can find the statistical model explicitly stated.

In the extreme case where (a) you contemplate the possibility of all regression coefficients differing between the two groups (the intercept and the slope and the coefficients of any other covariates) and (b) you are using the OLS approximation, this analysis reduces to the Chow Test. The link is to a nice exposition by William Gould, who provides plain-spoken advice ("I blame ... teachers for ... unnecessary jargon") and clear examples. Don't worry that the software is Stata; the output is what matters and it's standard.

Related Question