Solved – which test can be used for testing if the week over week increase/decrease is significant

statistical significance

I read some posts about this on this forum, but didn't seem to find an easy to understand explanation. so trying to post it here, please help to explain a bit more as i am not an expert on this but would really like to learn.

my question is almost just like the subject, say if I have some week over week stats like volume of traffic, for the current week, how do i know if the difference is significant or not. The purpose is if it's significant, then we would want to alert and take a look at it. If it's within noise, then we can have a nice weekend. also how to take into account seasonality, for example we expect to see the increase for holidays or when promotion happens.

Best Answer

My first try to approach this issue is through simple linear regression, in that case you could just run a regression with independent variable as week num and the dependent variable as traffic count, assuming linear relation, you could try to capture the long term trend, though this attitude assumes that all the OLS assumptions arn't violated, and in time series (which is your case) this could be easily violated.

If youre using R, you could use the auto.arima function, and add external explaining variable as long term trend, let me show you an example with R (i tried to paste the data but got blocked by SO limitations)

modArima <- auto.arima(data$count,xreg=train)
fit2<-forecast(modArima,h=horizon,xreg=predict)
summary(fit2)

and view Results of fit2, notice that the last coefficient of -0.6449 that shows the long term drop is significant

Here you can see that the prediction manages to catch the long term drop:

Data + Prediction with long term drop:

Standard tests may lack power

For reference, here is a standard test of independence:

x <- c(3,2,1,2,1,2,6)                            # The data
chisq.test(x, simulate.p.value=TRUE, B=9999)

X-squared = 7.2941, df = NA, p-value = 0.3263

(The p-value of $0.33$ is computed via simulation because the $\chi^2$ approximation to the distribution of the test statistic begins breaking down with such small counts.)

If--before seeing the data--it had been hypothesized that weekends might provoke more errors, then it would be more powerful to compare the Saturday+Sunday total to the Monday-Friday total, rather than using the $\chi^2$ statistic. Although we can analyze this special test fully (and obtain analytical results), it's simplest and more flexible just to perform a quick simulation. (The following is R code for $100,000$ iterations; it takes under a second to execute.)

n.iter <- 1e5                                    # Number of iterations
set.seed(17)                                     # Start a reproducible simulation
n <- sum(x)                                      # Sum of all data
sim <- rmultinom(n.iter, n, rep(1, length(x)))   # The simulated data, in columns
x.satsun <- sum(x[6:7])                          # The test statistic
sim.satsun <- colSums(sim[6:7, ])                # The simulation distribution
cat(mean(c(sim.satsun >= x.satsun, 1)))          # Estimated p-value

0.08357916

The output, shown on the last line, is the p-value of this test. It is much smaller than the $\chi^2$ p-value previously computed. This result would be considered significant by anyone needing 90% confidence, whereas few people would consider the $\chi^2$ p-value significant. That's evidence of the greater power to detect a difference.

Greater power is important: it leads to much smaller sample sizes. But I won't develop this idea, due to the conclusions in the next section.

A data-generated hypothesis gives false confidence

It is a much more serious issue that the hypothesis was inspired by the data. What we really need to test is this:

If there were no association between events and day of the week, what are the chances that the analyst would nevertheless have observed an unusual pattern "at face value"?

Although this is not definitively answerable, because we have no way to model the analyst's thought process, we can still make progress by considering some realistic possibilities. To be honest about it, we must contemplate patterns other than the one that actually appeared. For instance, if there had been 8 events on Wednesday and no more than 3 on any other day, it's a good bet that such a pattern would have been noted (leading to a hypothesis that Wednesdays are somehow error-inducing).

Other patterns I believe likely to be noted by any observant, interested analyst would include all apparent clusters of data, including:

Any single day with a high count.
Any two adjacent days with a high count.
Any adjacent days with a high count.

"Adjacent" of course means in a circular sense: Sunday is adjacent to Monday even though those days are far apart in the data listing. Other patterns are possible, such as two separate days with high counts. I will not attempt an exhaustive list; these three patterns will suffice to make the point.

It is useful to evaluate the chance that a perfectly random dataset would have evoked notice in this sense. We can evaluate that chance by simulating many random datasets and counting any that look at least as unusual as the actual data on any of these criteria. Since we already have our simulation, the analysis is a matter of a few seconds' more work:

stat <- function(y) {
  y.2 <- c(y[-1], y[1]) + y         # Totals of adjacent days
  y.3 <- y.2 + c(y[-(1:2)], y[1:2]) # Totals of 3-day groups
  c(max(y), max(y.2), max(y.3))     # Largest values for 1, 2, 3 days
}
sim.stat <- apply(sim, 2, stat)
x.stat <- stat(x)
extreme <- colSums(sim.stat >= x.stat) >= 1
cat(p.value <- mean(c(extreme, 1)))

0.3889561

This result is a much more realistic assessment of the situation than we have seen before. It suggests there is almost no objective evidence that events are related to day of week.

Conclusions

The best solution, then, might be to conclude there likely is not anything unusual going on. Keep monitoring the events, but do not worry about how much time will be needed to produce "significant" results.

Best Answer

Related Solutions

Solved – Increase sample size for significant correlation

Solved – Testing if frequency of event occurrences over week days is uniform

Standard tests may lack power

A data-generated hypothesis gives false confidence

Conclusions

Related Question