Solved – Difference between monte carlo based analysis and a hypothesis test

hypothesis testingmonte carloresampling

Is it reasonable to use Monte Carlo methods to resample a dataset of weekly rainfall amounts to statistically test for difference between two timeseries? That is, randomly pull ~30 paired observations for two gauges, conduct a test of differences (e.g., sign test), repeat 1000 times, and count the number of times the p-value is less than 0.05. Power analysis would suggest that if 80 percent of the p-values are less than 0.05 then the test has adequate power to detect a true difference. However, I don't think this is the same as accepting the null unless more than 80 percent of the results have a p-value less than 0.05? The datasets are large sample size and observations are serially correlated so I am trying to remove that correlation and test differences under reasonable sample sizes. Perhaps there is a better / simpler method?

Best Answer

You are sort of describing a bootstrap approach to solving the problem. Indeed, resampling rows from the data you have collected gives you a robust method of calculating a confidence interval for some desired effect, whether a mean difference, or a rank based statistic, or a p-value resulting from such a test.

As you know, you can perform inference by inspecting whether a 95% confidence interval for a effect contains the null hypothesized value. In this case, you are basically concerned with whether the mean difference is 0 in your two timeseries.

It turns out that calculating mean differences with bootstrapped data, creating a 95% confidence interval, and inspecting whether this interval includes 0 or not is an asymptotically equivalent approach to conducting the t-test (which you have called a hypothesis test in your problem description).

The bootstrap approach has many desirable properties: small sample correctness for one, no assumptions at the cost of very poor efficiency and very small sample bias.

So depending on the sample size, I would consider the bootstrap if you have more than, say, 50 observations in each time series. Otherwise, a regular t-test will probably be more efficient and is a reasonable test.

Related Question