When is cross-sectional data better than panel data? That might be a weird and strange question: I personally think panel is the best choice but I still want to know if there is some expectations or unique value of using cross-sectional data. Say, in which situation do you prefer to use cross-sectional data than panel data (assume that both of them are available)?
Solved – When is cross-sectional data better than panel data
cross-sectionpanel data
Related Solutions
Sure. All too many people get really hung up on this issue. In an applied environment with technically illiterate or even semi-literate people, it's usually the first question any modeler gets, "What's the r-square?"
Back in the 80s, Don Morrison, a prominent marketing scientist, published an article discussing how r-squares (or pseudo-rsquares) approaching zero can still provide predictive lift. In addition, Herbert Gans, a Columbia sociologist, wrote a separate paper that classified r-squares by the type of data used as inputs.
Here is a summary of those insights and observations. Note that all results are almost entirely dependent on the type of information under analysis:
In direct marketing and/or CRM industries, r-squares in the low single digits can still provide predictive lift.
o One example of this is a logistic regression predicting the likelihood of an email campaign driving magazine subscriptions. By ranking and partitioning the potential recipients into deciles (or ventiles) and selecting the top loading buckets, the recipients most likely to subscribe can be targeted, minimizing campaign costs.
In cross-sectional modeling based on survey data, Gans felt that r-squares around 10%-20% were the norm. If the results are much higher than that, then there is a strong possibility that a regression assumption is being violated.
In business settings using, e.g., panel data models based on financial information, r-squares of 40%-60% are the norm.
In marketing science if you have product sales time series with a full set of "causal" factors -- e.g., price, promotion, distribution, marketing spend by store and/or markets -- then r-squares approaching 100% are not unusual since most of the variance is being explained.
I think they are often used interchangeably depending on the field, but the way I think of it is panel data explicitly views the repeated measurements as occurring through time whereas repeated cross-sectional can be some arbitrary dimension. Your example sounds pretty good to me.
To expand on it, let's say you have observational data of 100 patients' daily blood pressure measurements for 3 months. For a single patient $i$, we can think of their data as a time series with 3 months worth of daily observations. This is panel data.
On the other hand, perhaps some patients only recorded measurements every other day while some recorded it several times a day. It is still repeated through "time", but the time index is somewhat meaningless now since the $j$th observation for two different patients may correspond to different timestamps entirely.
The practical implication is that if we wished to include time effects such as seasonality (perhaps there is an overall increase in blood pressure in the population over summer due to increased consumption of hot dogs), panel data can allow for identification whereas repeated cross-sectional data cannot.
Best Answer
Never, you can always get the same information from a panel as from cross sectional data (just by discarding the additional years). I can think of no situation where I would prefer CS data over panel data. For some questions you might not need panel data, and so there is no point in paying for it - assuming you don't have to pay anything for data my answer is thus never.
If you have to pay a fee, of some sort, then you face a trade-off.