Data Normalization – How to Normalize Count Data of Time Periods with Different Lengths

count-datanormalizationstandardization

I have count-data from two time-periods which differ in length. The event I'm counting is in both periods the same kind of event.

Period 1 is 120 hours
Period 2 is 48 hours

At the end I have something like this table:

Event occurred in Period 1       Event occurred in Period 2
        275 times                          129 times

I want to compare the data with a e.g. chi2-test. Of course, if I would do this without normalization, the result wouldn't be reliable. But what is a good normalization/standardization (I know that these are different terms) method to accomplish this? I appreciate every thoughts on that topic.

EDIT: Accidentally I switched the periods. I corrected the data.

Best Answer

Generally you don't make them comparable by doing something to the counts, but you do take account of the different exposures in computing the expected values in the chi-squared test.

Under a null hypothesis of equal event rates (events per hour), the two periods can simply be combined to estimate the rate ... that is $275+129$ events in $120+48$ hours, so we estimate the rate as $\frac{275+129}{120+48}$ events per hour, and the expected count in period 1 is then $(275+129)\frac{120}{120+48}\approx 288.57$ and in period 2 is $(275+129)\frac{48}{120+48}\approx 115.43$.

With those expected values, the chi-square goodness of fit statistic, $\sum_i \frac{(O_i-E_i)^2}{E_i}$ is straightforward to calculate by hand; it has $k-1=1$ degree of freedom in this example. However, it's a pretty standard calculation - for example, here it is in R:

eventcounts = c(275,129)
exposuretime = c(120,48)
chisq.test(eventcounts,  p = exposuretime, rescale.p = TRUE)

        Chi-squared test for given probabilities

data:  eventcounts
X-squared = 2.2339, df = 1, p-value = 0.135

which is the same result as doing it by hand.

Related Question