I have a panel data set (country and year) on which I would like to run a cluster analysis by country. My data set has around 20 variables.
Here's a summary for my panel data:
panel variable: country (strongly balanced)
time variable: year, 2010 to 2013
Running a kmeans cluster analysis on 2013 data only is pretty straightforward. But how would you do the analysis considering all observations in the 2010-2013 period? Is k-means clustering an appropriate approach?
Here's what I ran in Stata for 2013 only:
cluster kmeans var1 var2 var3 var4 var5 var6 if year==2013, k(4) name(test1)
Thanks!
Best Answer
I would reshape wide so each year's data is its own variable and then cluster. This will group countries that follow similar timepaths for your 6 variables.
Try something like this in Stata: