Solved – Cluster analysis on panel data

clusteringk-meanspanel datastata

I have a panel data set (country and year) on which I would like to run a cluster analysis by country. My data set has around 20 variables.

Here's a summary for my panel data:

panel variable: country (strongly balanced) time variable: year, 2010 to 2013

Running a kmeans cluster analysis on 2013 data only is pretty straightforward. But how would you do the analysis considering all observations in the 2010-2013 period? Is k-means clustering an appropriate approach?

Here's what I ran in Stata for 2013 only:

cluster kmeans var1 var2 var3 var4 var5 var6 if year==2013, k(4) name(test1)

Thanks!

Best Answer

I would reshape wide so each year's data is its own variable and then cluster. This will group countries that follow similar timepaths for your 6 variables.

Try something like this in Stata:

reshape wide var@1 var@2 var@3 var@4 var@5 var@6, i(country) j(year);
cluster kmeans var*1 var*2 var*3 var*4 var*6, k(4) name(test1)

Related Solutions

Panel Regression – Understanding Unique Time Variable Fixed Effect in Panel Regression

Your emphasis is likely to be the wrong way round. Throwing away time as a variable is unlikely to be the best way forward.

You need to think hard about what is a panel. If you really want countries to be panels, then you may need to average or otherwise combine repeated observations for the same country and the same year.

But it sounds as if your panels are firms. If so, firm should be the panel identifier.

I don't know what cusip identifier is.

Reading list:

http://www.stata.com/support/faqs/resources/statalist-faq/#spell

http://www.stata.com/support/faqs/data-management/repeated-time-values/index.html

Solved – Levin Lin Chu test in Stata

You can use Fisher-type unit-root test which is based on augmented Dickey-Fuller tests. Stata command for the test is following:

xtunitroot fisher varname, dfuller lag(1)

Best Answer

Related Solutions

Panel Regression – Understanding Unique Time Variable Fixed Effect in Panel Regression

Solved – Levin Lin Chu test in Stata

Related Question