Solved – Panel data descriptives, plots and ‘feel for the data’

data visualizationdescriptive statisticsexploratory-data-analysispanel datastata

I have a dataset for around 40k firms over fiscal years 1950-2011 with about 430k firm-years. If I'm not mistaken I have panel data.

I created a unique identifier ticn for each firm. Years are indicated by fyear. For now my variables of interest are yearly sales sale, yearly advertising xad, and yearly R&D expenses xrd.

I want to get a feel for the data, give descriptives and maybe make a few plots. Can you give me ideas for some analyses?

Also because of the size of the data it's not really easy to just scroll through it to get a good feel for it. Due to the nature of the data (a lot of observations) I decided to use Stata, also because I have a little experience with it (I don't have any experience with R, SAS or Matlab). If I scroll through the data I see gaps and missings so we may conclude that I have unbalanced panel data.


Here I have some descriptives but I can't think of more relevant descriptives, statistics, plots, or analyses to get a feel for the data.

xtsum ticn fyear xad xrd sale

    Variable         |      Mean   Std. Dev.       Min        Max |    Observations
-----------------+--------------------------------------------+----------------
ticn     overall |  16839.99   9333.823          1      32930 |     N =  429426
         between |              9506.36          1      32930 |     n =   32929
         within  |                    0   16839.99   16839.99 | T-bar =  13.041
                 |                                            |
fyear    overall |  1990.307   13.88944       1950       2011 |     N =  429426
         between |             11.81111       1959       2011 |     n =   32929
         within  |             7.754303   1959.807   2020.807 | T-bar =  13.041
                 |                                            |
xad      overall |  35.16947   224.9083          0       9315 |     N =  110114
         between |              137.133          0   7171.027 |     n =   14310
         within  |             119.1316  -3952.802   6351.562 | T-bar =  7.6949
                 |                                            |
xrd      overall |  47.81559   340.4286          0      12183 |     N =  153958
         between |             197.5934          0     8502.3 |     n =   15455
         within  |             201.0934  -4044.806   10102.42 | T-bar =  9.9617
                 |                                            |
sale     overall |  1174.005   7084.566          0     470171 |     N =  379747
         between |             4244.681          0   172849.2 |     n =   29918
         within  |             4546.078  -115788.6   335794.2 | T-bar = 12.6929

What I did to get these.

I dropped missing cases from my panel variables.

drop if missing(fyear)
drop if missing(ticn) 

I reported, listed, tagged and removed duplicates in the panel variables.

duplicates report ticn fyear
duplicates list ticn fyear
duplicates tag ticn fyear, gen(isdup) 
drop if isdup == 1
drop isdup

I removed negative values.

drop if xad < 0
drop if xrd < 0
drop if sale < 0

Finally I am able to declare panel data and enable panel data analyses!

xtset ticn fyear

Best Answer

There's a variety of stuff you could do. Try this first:

    local rnd = floor( uniform()*1000)
    line xad xrd sale fyear if mod( ticn, 1000 ) == `rnd', by( ticn )

Try this several times, as obviously you will be getting different firms filtered in.