Solved – Cluster analysis on panel data

clusteringk-meanspanel datastata

I have a panel data set (country and year) on which I would like to run a cluster analysis by country. My data set has around 20 variables.

Here's a summary for my panel data:

panel variable: country (strongly balanced)
time variable: year, 2010 to 2013

Running a kmeans cluster analysis on 2013 data only is pretty straightforward. But how would you do the analysis considering all observations in the 2010-2013 period? Is k-means clustering an appropriate approach?

Here's what I ran in Stata for 2013 only:

cluster kmeans var1 var2 var3 var4 var5 var6 if year==2013, k(4) name(test1)

Thanks!

Best Answer

I would reshape wide so each year's data is its own variable and then cluster. This will group countries that follow similar timepaths for your 6 variables.

Try something like this in Stata:

reshape wide var@1 var@2 var@3 var@4 var@5 var@6, i(country) j(year);
cluster kmeans var*1 var*2 var*3 var*4 var*6, k(4) name(test1)
Related Question