Regression Models – Understanding Left-Hand & Right-Hand Side Nomenclature in Regression

regressionterminology

$$y_i = \beta_{0} + \beta_{1}x_{1i} + \varepsilon_{0i}$$

The language to describe regression models, such as the very simple linear regression specified above often varies and such variations often carry subtle shifts in meanings. For example, the part of the model on the left-hand side of the equation may be termed (among others I am ignorant of) with connotations and denotations in parentheses:

  • Dependent variable (hints at causal dependence)
  • Predicted variable (implies the model forecasts/makes predictions)
  • Response variable (implies causality, or at least temporal sequencing)
  • Outcome variable (implies causality)

The variation in nomenclature is also true on the right-hand side of the equation (same disclaimer that I am an ignoramus about other terms):

  • Independent variable (implies causal priority, hints at experimental design)
  • Predictor variable (implies forecasts, implies that the variable has a non-zero parameter estimate associated with it)

In the course of proposing vetting, or communicating research I have had occasion to not only be called on the use of one term or another, but to subsequently be called on the term I chose to replace it with. While the people calling were of course being pedantic (NB: I am a professional pedant, so I sympathize), because of course we all understood what was being communicated, I still wonder:

Are there commonly used terms for the left-hand and right hand variables in regression models that are agnostic with respect to (a) the external uses of the model, (b) causal relationships between the variables, and (c) aspects of the study designs used to produce the variables themselves?

NB: I am not asking about the important issues of proper modeling and proper interpretation (i.e. I care very much about causality, study design, etc.), but am more interested in a language for talking about such models generally.

(I realize that "left-hand variables" and "right-hand variables" might, I suppose, be construed as a credible answer, but these terms seem clunky… maybe this is a clunky question. 🙂

Best Answer

This is an excellent question. Actually, it is so good that there are no answer to it. To the best of my knowledge, there are no true "agnostic" term for describing Y.

In my experience and readings, I found that the semantic is domain-specific and also model-objective-specific.

Econometricians will use the Dependent variable terms when building a model that is explanatory. They may use the terms Predicted or Fitted or Estimated variable when they are building a forecasting model that is more focused on accurate estimation/prediction than on theoretical explanatory power.

The Big Data/Deep Learning crowd uses a completely different language. And, they will typically use the terms Response variable or Target variable. Their models are such black boxes that they typically do not attempt to explain a phenomenon as rather to predict it and estimate it accurately. But, somehow they wouldn't be caught using the term Predicted. They far prefer the terms Response or Target.

I am less familiar with the term Outcome variable. It may be prevalent in other areas I am less exposed to such as social sciences including psychology, medicine, clinical trials, epidemiology.

In view of the above, I could not provide you with any "agnostic" semantic for describing Y. Instead, I provided a bit of information on what semantic to use when catering to different audience and also reflecting the objective of your model. In summary, I don't think anyone gets hurt if you talk about Dependent variable with econometricians and Response or Target variable with Deep Learning types. Hopefully, you can separate those crowds apart otherwise you could have a verbal food fight on your hand.