Solved – How to do time series ( longitudinal) clustering based entirely on Shape of the curves

clusteringk-meanspanel datatime series

I have a longitudinal (panel) dataset for investment growth for 120 countries covering the time from 1960-2008. Essentially it's viewed as 120 time series.

What I am interested in is to group countries based on their shape of their growth curves over time. Thus whether they share similar Shape of their curves are the only criteria I need for grouping those countries.

I have tried KmL package (K-means for Longitudinal Data), but it seems that (please correct me if I am wrong) this methodology produces the result that group countries exhibiting similar (investment growth) mean value (or magnitude), not exactly according to the similar shape. For example, KmL tends to group countries with high investment growth, median average investment growth, low investment growth, etc. The countries within those groups may have very different shape of curves over time.

What I am looking for is regardless of the absolute value of investment growth. As long as the two countries exhibit similar pattern of their growth over time curve, they should be grouped together in one group.

Could anyone tell me a way to implement this clustering? I have noticed from previous posts that cointegration test may work. Any suggestions will be greatly appreciated!

Best Answer

If you z-standardize each of your series, $(X_i-\bar{X})/\sigma$, that is, unify level of the series firstly and swing of the series secondly, then the only difference that remains is the difference in shape. Compute euclidean distances (or similar measure) between 120 series and perform hierarchical clustering. You might also want (maybe) to do mild smoothig of the curves prior all.