Solved – Testing if frequency of event occurrences over week days is uniform

count-datastatistical significancetime series

I’m trying to figure out if there’s a calculation that can be done to show you have enough data to draw conclusions on trends within the data set.

I have made an observation on a class of equipment that it seems to throw an error code on a specific day only, more often than the other days. The events are independent (i.e., when one unit throws the code the others don’t).
This error code occurs infrequently – I’ve only observed it 18 times in total over several years. Here is the breakdown:

$$\begin{array}{} \mathrm{Monday} & 3 \\
\mathrm{Tuesday} & 2\\
\mathrm{Wednesday} & 1\\
\mathrm{Thursday} & 2\\
\mathrm{Friday} & 1\\
\mathrm{Saturday} & 2\\
\mathrm{Sunday} & 6\\
\end{array}$$

My hypothesis is that the error code is caused by a design flaw in the unit that causes an infrequent failure. However, at face value it appears there’s a strong correlation between Sunday and failures, despite there not being anything unique occurring on Sunday – it’s just like any other day as far as the equipment is concerned.

I think that we just haven’t observed this error enough times. Is there a calculation that can be done to indicate when you’ve collected enough data?

If this error occurred more times, I’d bet the occurences would even out.

Best Answer

The test used will determine how to assess how much data are needed. However, standard tests, such as the $\chi^2$, would seem to be inferior or inappropriate, for two reasons:

  1. The alternative hypothesis is more specific than mere lack of independence: it focuses on a high count during one particular day.

  2. More importantly, the hypothesis was inspired by the data itself.

Let's examine these in turn and then draw conclusions.

Standard tests may lack power

For reference, here is a standard test of independence:

x <- c(3,2,1,2,1,2,6)                            # The data
chisq.test(x, simulate.p.value=TRUE, B=9999)

X-squared = 7.2941, df = NA, p-value = 0.3263

(The p-value of $0.33$ is computed via simulation because the $\chi^2$ approximation to the distribution of the test statistic begins breaking down with such small counts.)

If--before seeing the data--it had been hypothesized that weekends might provoke more errors, then it would be more powerful to compare the Saturday+Sunday total to the Monday-Friday total, rather than using the $\chi^2$ statistic. Although we can analyze this special test fully (and obtain analytical results), it's simplest and more flexible just to perform a quick simulation. (The following is R code for $100,000$ iterations; it takes under a second to execute.)

n.iter <- 1e5                                    # Number of iterations
set.seed(17)                                     # Start a reproducible simulation
n <- sum(x)                                      # Sum of all data
sim <- rmultinom(n.iter, n, rep(1, length(x)))   # The simulated data, in columns
x.satsun <- sum(x[6:7])                          # The test statistic
sim.satsun <- colSums(sim[6:7, ])                # The simulation distribution
cat(mean(c(sim.satsun >= x.satsun, 1)))          # Estimated p-value

0.08357916

The output, shown on the last line, is the p-value of this test. It is much smaller than the $\chi^2$ p-value previously computed. This result would be considered significant by anyone needing 90% confidence, whereas few people would consider the $\chi^2$ p-value significant. That's evidence of the greater power to detect a difference.

Greater power is important: it leads to much smaller sample sizes. But I won't develop this idea, due to the conclusions in the next section.

A data-generated hypothesis gives false confidence

It is a much more serious issue that the hypothesis was inspired by the data. What we really need to test is this:

If there were no association between events and day of the week, what are the chances that the analyst would nevertheless have observed an unusual pattern "at face value"?

Although this is not definitively answerable, because we have no way to model the analyst's thought process, we can still make progress by considering some realistic possibilities. To be honest about it, we must contemplate patterns other than the one that actually appeared. For instance, if there had been 8 events on Wednesday and no more than 3 on any other day, it's a good bet that such a pattern would have been noted (leading to a hypothesis that Wednesdays are somehow error-inducing).

Other patterns I believe likely to be noted by any observant, interested analyst would include all apparent clusters of data, including:

  • Any single day with a high count.

  • Any two adjacent days with a high count.

  • Any adjacent days with a high count.

"Adjacent" of course means in a circular sense: Sunday is adjacent to Monday even though those days are far apart in the data listing. Other patterns are possible, such as two separate days with high counts. I will not attempt an exhaustive list; these three patterns will suffice to make the point.

It is useful to evaluate the chance that a perfectly random dataset would have evoked notice in this sense. We can evaluate that chance by simulating many random datasets and counting any that look at least as unusual as the actual data on any of these criteria. Since we already have our simulation, the analysis is a matter of a few seconds' more work:

stat <- function(y) {
  y.2 <- c(y[-1], y[1]) + y         # Totals of adjacent days
  y.3 <- y.2 + c(y[-(1:2)], y[1:2]) # Totals of 3-day groups
  c(max(y), max(y.2), max(y.3))     # Largest values for 1, 2, 3 days
}
sim.stat <- apply(sim, 2, stat)
x.stat <- stat(x)
extreme <- colSums(sim.stat >= x.stat) >= 1
cat(p.value <- mean(c(extreme, 1)))

0.3889561

This result is a much more realistic assessment of the situation than we have seen before. It suggests there is almost no objective evidence that events are related to day of week.

Conclusions

The best solution, then, might be to conclude there likely is not anything unusual going on. Keep monitoring the events, but do not worry about how much time will be needed to produce "significant" results.

Related Question