Solved – an acceptable R squared range for cross sectional data linear regressions

cross-sectiondescriptive statisticsr-squaredregression

May I know an acceptable R squared range for a cross sectional data analysis using linear regression?
I think the requirement for this is lower than time series or panel data but would like to know even a rough idea on this.

For my straightforward question, is r squared of 0.16 or 0.17 acceptable under this condition?

Best Answer

Sure. All too many people get really hung up on this issue. In an applied environment with technically illiterate or even semi-literate people, it's usually the first question any modeler gets, "What's the r-square?"

Back in the 80s, Don Morrison, a prominent marketing scientist, published an article discussing how r-squares (or pseudo-rsquares) approaching zero can still provide predictive lift. In addition, Herbert Gans, a Columbia sociologist, wrote a separate paper that classified r-squares by the type of data used as inputs.

Here is a summary of those insights and observations. Note that all results are almost entirely dependent on the type of information under analysis:

  • In direct marketing and/or CRM industries, r-squares in the low single digits can still provide predictive lift.

    o One example of this is a logistic regression predicting the likelihood of an email campaign driving magazine subscriptions. By ranking and partitioning the potential recipients into deciles (or ventiles) and selecting the top loading buckets, the recipients most likely to subscribe can be targeted, minimizing campaign costs.

  • In cross-sectional modeling based on survey data, Gans felt that r-squares around 10%-20% were the norm. If the results are much higher than that, then there is a strong possibility that a regression assumption is being violated.

  • In business settings using, e.g., panel data models based on financial information, r-squares of 40%-60% are the norm.

  • In marketing science if you have product sales time series with a full set of "causal" factors -- e.g., price, promotion, distribution, marketing spend by store and/or markets -- then r-squares approaching 100% are not unusual since most of the variance is being explained.

Related Question