I take it that you are investigating whether the correlation between two quantities is larger than $0$ and that you wish to know how many patients you need for your study to be able to show that it really is larger. In other words, I assume that you are using a one-sided test.
First of all, even if you collect a million samples, there is no guarantee that you will get a significant result. If the correlation actually is $0$, then you likely won't get a significant result. But even if it is non-zero, there is always a possibility that you, due to randomness, won't get a significant result.
Second, how large the sample needs to be depends on how large the true correlation is.
I ran a quick computer simulation ($10,000$ repetitions) to investigate how large the sample size needs to be in order to get a high probability of a significant result. It is based on the assumption that the quantities that you measure are normally distributed. If that is not the case, then these calculations will be in error. Not necessarily a large error, but nevertheless in error.
The plots below show what the probability of getting a significant ($p<0.05$) result (called the power of the test) for different sample sizes ($n$) and different true values of the population correlation (rho=$\rho$):
If $\rho=0.2$ and $n=80$, the probability of a significant result is roughly $50~\%$. If $\rho=0.1$ and $n=80$, the probability is about $20~\%$. As you can see, it is easier to detect a large correlation than a small one.
What is typically done in these cases is to say "if $\rho=0.2$ then I want at least an $80~\%$ probability of a significant result" and to choose the smallest $n$ that satisifies that condition.
As a final remark, there are sequential sampling methods where you collect more samples until you get a significant result, but there are some caveats to them. If you're thinking of using such a sampling strategy I recommend that you consult a statistican to make sure that you use it in the right way.
The test used will determine how to assess how much data are needed. However, standard tests, such as the $\chi^2$, would seem to be inferior or inappropriate, for two reasons:
The alternative hypothesis is more specific than mere lack of independence: it focuses on a high count during one particular day.
More importantly, the hypothesis was inspired by the data itself.
Let's examine these in turn and then draw conclusions.
Standard tests may lack power
For reference, here is a standard test of independence:
x <- c(3,2,1,2,1,2,6) # The data
chisq.test(x, simulate.p.value=TRUE, B=9999)
X-squared = 7.2941, df = NA, p-value = 0.3263
(The p-value of $0.33$ is computed via simulation because the $\chi^2$ approximation to the distribution of the test statistic begins breaking down with such small counts.)
If--before seeing the data--it had been hypothesized that weekends might provoke more errors, then it would be more powerful to compare the Saturday+Sunday total to the Monday-Friday total, rather than using the $\chi^2$ statistic. Although we can analyze this special test fully (and obtain analytical results), it's simplest and more flexible just to perform a quick simulation. (The following is R
code for $100,000$ iterations; it takes under a second to execute.)
n.iter <- 1e5 # Number of iterations
set.seed(17) # Start a reproducible simulation
n <- sum(x) # Sum of all data
sim <- rmultinom(n.iter, n, rep(1, length(x))) # The simulated data, in columns
x.satsun <- sum(x[6:7]) # The test statistic
sim.satsun <- colSums(sim[6:7, ]) # The simulation distribution
cat(mean(c(sim.satsun >= x.satsun, 1))) # Estimated p-value
0.08357916
The output, shown on the last line, is the p-value of this test. It is much smaller than the $\chi^2$ p-value previously computed. This result would be considered significant by anyone needing 90% confidence, whereas few people would consider the $\chi^2$ p-value significant. That's evidence of the greater power to detect a difference.
Greater power is important: it leads to much smaller sample sizes. But I won't develop this idea, due to the conclusions in the next section.
A data-generated hypothesis gives false confidence
It is a much more serious issue that the hypothesis was inspired by the data. What we really need to test is this:
If there were no association between events and day of the week, what are the chances that the analyst would nevertheless have observed an unusual pattern "at face value"?
Although this is not definitively answerable, because we have no way to model the analyst's thought process, we can still make progress by considering some realistic possibilities. To be honest about it, we must contemplate patterns other than the one that actually appeared. For instance, if there had been 8 events on Wednesday and no more than 3 on any other day, it's a good bet that such a pattern would have been noted (leading to a hypothesis that Wednesdays are somehow error-inducing).
Other patterns I believe likely to be noted by any observant, interested analyst would include all apparent clusters of data, including:
Any single day with a high count.
Any two adjacent days with a high count.
Any adjacent days with a high count.
"Adjacent" of course means in a circular sense: Sunday is adjacent to Monday even though those days are far apart in the data listing. Other patterns are possible, such as two separate days with high counts. I will not attempt an exhaustive list; these three patterns will suffice to make the point.
It is useful to evaluate the chance that a perfectly random dataset would have evoked notice in this sense. We can evaluate that chance by simulating many random datasets and counting any that look at least as unusual as the actual data on any of these criteria. Since we already have our simulation, the analysis is a matter of a few seconds' more work:
stat <- function(y) {
y.2 <- c(y[-1], y[1]) + y # Totals of adjacent days
y.3 <- y.2 + c(y[-(1:2)], y[1:2]) # Totals of 3-day groups
c(max(y), max(y.2), max(y.3)) # Largest values for 1, 2, 3 days
}
sim.stat <- apply(sim, 2, stat)
x.stat <- stat(x)
extreme <- colSums(sim.stat >= x.stat) >= 1
cat(p.value <- mean(c(extreme, 1)))
0.3889561
This result is a much more realistic assessment of the situation than we have seen before. It suggests there is almost no objective evidence that events are related to day of week.
Conclusions
The best solution, then, might be to conclude there likely is not anything unusual going on. Keep monitoring the events, but do not worry about how much time will be needed to produce "significant" results.
Best Answer
My first try to approach this issue is through simple linear regression, in that case you could just run a regression with independent variable as week num and the dependent variable as traffic count, assuming linear relation, you could try to capture the long term trend, though this attitude assumes that all the OLS assumptions arn't violated, and in time series (which is your case) this could be easily violated.
If youre using R, you could use the
auto.arima
function, and add external explaining variable as long term trend, let me show you an example with R (i tried to paste the data but got blocked by SO limitations)and view Results of fit2, notice that the last coefficient of -0.6449 that shows the long term drop is significant
Here you can see that the prediction manages to catch the long term drop: