I'm using the Mann-Kendall function of the Kendall package in R to compute the statistics of the Mann-Kendall trend test of a huge time-series (19 millions elements). It has been running for 22 hours and it hasn't yet finished. Can you suggest a faster approach?
Solved – Mann-Kendall trend test of a huge time-series in R
hypothesis testingrtime seriestrend
Related Solutions
I realize that this question is quite old and the answer will come too late for you, but perhaps it could be useful for others having similar problems. Since you mention the Gretl program, I will try to answer how it can be done with Gretl.
Mann-Kendall trend test: Very recently a function package (additional functionality) named MannKendall.gfn has been provided which you can use. (Full disclosure: co-authored by myself...)
Seasonality: Of course there are many ways including the popular X13 seasonal adjustment procedure, but as an easy start I would suggest to add seasonal dummies to your model. Given your daily data, perhaps use monthly dummies. Since the data sampling (daily) is not equal to the wanted seasonal dummy periodicity (monthly) I think you cannot use the built-in seasonals() function (nor the menus) for that. Instead to create for example a dummy for observations in January in Gretl you could do
series M1 = ($obsminor == 1)
, because $obsminor gives you the monthly information here.Structural breaks: Again, there are many ways. If you have an idea about the location of the break, you could apply a Chow breakpoint test. If the break date is unknown, you could use another function package for Gretl, named StrucBreak.gfn. This package implements the break tests from Bai and Perron ( (2003, J of Applied Econometrics); given the complexity of options it cannot (yet) be used from the menus, but it comes with a fairly comprehensive help document that explains how to use it.
The test statistic will be obtained by calculating the Kendall correlation of the time series with the sequence $1,2,...,n$. Getting identical Kendall correlations with very small samples is not a surprise.
When there's no ties, this correlation corresponds to counting the number of increases (times $y_j>y_i$ when $j>i$) minus number of decreases (times $y_j<y_i$ when $j>i$) divided by number of such pairs.
Your 3 sets of ranks are
rank(x);rank(y);rank(z)
[1] 3 5 4 1 2
[1] 5 3 1 4 2
[1] 5 3 1 4 2
the last two sets are identical, so the Kendall correlation for z must be the same as for y. Let's just look at the first two, then. "3 5 4 1 2" has 3 increases (3 vs 5, 3 vs 4 and 1 vs 2) and the rest are decreases. "5 3 1 4 2" has 3 increases (3 vs 4, 1 vs 4 and 1 vs 2) with the rest decreases. This means the Kendall correlation with time in each case is $(3 - 7)/10 = -0.4$
... which is what your test says they are; the p-values are then the same because the sample sizes are the same.
Best Answer
I'd suggest thinking more about what you want to know about your data. The Mann-Kendall test is almost sure to be significant; with that many data points, the variance of Kendall's tau (the nonparametric correlation used here) is 2.33e-08, so a correlation of 0.001, which is unlikely to be practically significant, would still have a p-value of about 6e-11.
Computationally, the Mann-Kendall function is using Fortran under the hood, so it's unlikely it could be sped up; the problem is that there are 1.805e14 pairs to consider -- that's a lot of pairs!