Econometrics: Major Differences Between Econometrics and Other Statistical Fields

econometricsphilosophicalterminology

Econometrics has substantial overlap with traditional statistics, but often uses its own jargon about a variety of topics ("identification," "exogenous," etc.). I once heard an applied statistics professor in another field comment that frequently the terminology is different but the concepts are the same. Yet it also has its own methods and philosophical distinctions (Heckman's famous essay comes to mind).

What terminology differences exist between econometrics and mainstream statistics, and where do the fields diverge to become different in more than just terminology?

Best Answer

There are some terminological differences where the same thing is called different names in different disciplines:

  1. Longitudinal data in biostatistics are repeated observations of the same individuals = panel data in econometrics.
  2. The model for a binary dependent variable in which the probability of 1 is modeled as $1/(1+\exp[-x'\beta])$ is called a logit model in econometrics, and logistic model in biostatistics. Biostatisticians tend to work with logistic regression in terms of odds ratios, as their $x$s are often binary, so the odds ratios represent the relative frequencies of the outcome of interest in the two groups in the population. This is such a common interpretation that you will often see a continuous variable transformed into two categories (low vs high blood pressure) to make this interpretation easier.
  3. Statisticians' "estimating equations" are econometricians' "moment conditions". Statisticians' $M$-estimates are econometricians' extremum estimators.

There are terminological differences where the same term is used to mean different things in different disciplines:

  1. Fixed effects stand for the $x'\beta$ in the regression equation for ANOVA statisticians, and for a "within" estimator in longitudinal/panel data models for econometricians. (Random effects are cursed for econometricians, for good.)
  2. Robust inference means heteroskedasticity-corrected standard errors for economists (with extensions to clustered standard errors and/or autocorrelation-corrected standard errors) and methods robust to far outliers to statisticians.
  3. It seems that economists have a ridiculous idea that stratified samples are those in which probabilities of selection vary between observations. These should be called unequal probability samples. Stratified samples are those in which the population is split into pre-defined groups according to characteristics known before sampling takes place.
  4. Econometricians' "data mining" (at least in the 1980s literature) used to mean multiple testing and pitfalls related to it that have been wonderfully explained in Harrell's book. Computer scientists' (and statisticians') data mining procedures are non-parametric methods of finding patterns in the data, also known as statistical learning.
  5. Horvitz-Thompson estimator is a non-parametric estimator of a finite population total in sampling statistics that relies on fixed probabilities of selection, with variance determined by the second order selection probabilities. In econometrics, it had grown to denote inverse propensity weighting estimators that rely on a moderately long list of the standard causal inference assumptions (conditional independence, SUTVA, overlap, all that stuff that makes Rubin's counterfactuals work). Yeah, there is some sort of probability in the denominator in both, but understanding the estimator in one context gives you zero ability to understand the other context.

I view the unique contributions of econometrics to be

  1. Ways to deal with endogeneity and poorly specified regression models, recognizing, as mpiktas has explained in another answer, that (i) the explanatory variables may themselves be random (and hence correlated with regression errors producing bias in parameter estimates), (ii) the models can suffer from omitted variables (which then become part of the error term), (iii) there may be unobserved heterogeneity of how economic agents react to the stimuli, thus complicating the standard regression models. Angrist & Pischke is a wonderful review of these issues, and statisticians will learn a lot about how to do regression analysis from it. At the very least, statisticians should learn and understand instrumental variables regression.
  2. More generally, economists want to make as few assumptions as possible about their models, so as to make sure that their findings do not hinge on something as ridiculous as multivariate normality. That's why GMM and empirical likelihood are hugely popular with economists, and never caught up in statistics (GMM was first described as minimum $\chi^2$ by Ferguson, and empirical likelihood, by Jon Rao, both famous statisticians, in the late 1960s). That's why economists run their regression with "robust" standard errors, and statisticians, with the default OLS $s^2 (X'X)^{-1}$ standard errors.
  3. There's been a lot of work in the time domain with regularly spaced processes -- that's how macroeconomic data are collected. The unique contributions include integrated and cointegrated processes and autoregressive conditional heteroskedasticity ( (G)ARCH ) methods. Being generally a micro person, I am less familiar with these.

Overall, economists tend to look for strong interpretation of coefficients in their models. Statisticians would take a logistic model as a way to get to the probability of the positive outcome, often as a simple predictive device, and may also note the GLM interpretation with nice exponential family properties that it possesses, as well as connections with discriminant analysis. Economists would think about the utility interpretation of the logit model, and be concerned that only $\beta/\sigma$ is identified in this model, and that heteroskedasticity can throw it off. (Statisticians will be wondering what $\sigma$ are the economists talking about, of course.) Of course, a utility that is linear in its inputs is a very funny thing from the perspective of Microeconomics 101, although some generalizations to semi-concave functions are probably done in Mas-Collel.

What economists generally tend to miss, but, IMHO, would benefit from, are aspects of multivariate analysis (including latent variable models as a way to deal with measurement errors and multiple proxies... statisticians are oblivious to these models, though, too), regression diagnostics (all these Cook's distances, Mallows' $C_p$, DFBETA, etc.), analysis of missing data (Manski's partial identification is surely fancy, but the mainstream MCAR/MAR/NMAR breakdown and multiple imputation are more useful), and survey statistics. A lot of other contributions from the mainstream statistics have been entertained by econometrics and either adopted as a standard methodology, or passed by as a short term fashion: ARMA models of the 1960s are probably better known in econometrics than in statistics, as some graduate programs in statistics may fail to offer a time series course these days; shrinkage estimators/ridge regression of the 1970s have come and gone; the bootstrap of the 1980s is a knee-jerk reaction for any complicated situations, although economists need to be better aware of the limitations of the bootstrap; the empirical likelihood of the 1990s has seen more methodology development from theoretical econometricians than from theoretical statisticians; computational Bayesian methods of the 2000s are being entertained in econometrics, but my feeling is that are just too parametric, too heavily model-based, to be compatible with the robustness paradigm I mentioned earlier. (EDIT: that was the view on the scene in 2012; by 2020, Bayesian models have become standard in empirical macro where people probably care a little less about robustness, and are making their presence heard in empirical micro, as well. They are just too easy to run these days to pass by.) Whether economists will find any use of the statistical learning/bioinformatics or spatio-temporal stuff that is extremely hot in modern statistics is an open call.