Solved – versus (vs.): how to properly use this word in data analysis

descriptive statisticsterminology

This question is probably more about the English language than statistics, but I have decided to ask it here anyway.

When we compare two groups of samples, let's say Treatment vs. Control, and we calculate not only p-values, but also effect size (or fold change), we want to know what is the baseline. In case of Treatment-Control it's quite obvious. So we understand that if fold change is positive, on average, values in the Treatment group are larger than in the Control group. But what about if one writes "Group A vs. Group B". Can we make an assumption what is the baseline just by the order of the groups in the statement: right side of *vs* (Group B) or left side (Group A)?

Another example: I measured two variables X and Y for a sample, and I plot them on a scatter plot, one dot per observation, variable X on the x-axis, and variable Y on the y-axis. How to properly describe the plot: "Y vs. X" or "X vs. Y", or both statements are identical?

I didn't find a good tag for this question and tried to create a "statistical-language" tag but don't have enough reputation. If you think it would be good and you can help, please do.

Best Answer

On plotting: I regard it as natural and conventional to say -- for scatter plots, line plots, and so forth -- that I plot Y versus X and in each case always to mention the response first and the other variable second. Thus I (say that I) plot temperature versus or against time, and wheat yield versus or against rainfall.

Why natural? Whenever you assert that such a relationship exists, the idea is that (in the examples given) temperature depends on, or is a function of, time, rather than vice versa; and wheat yield depends on, or is a function of, rainfall, rather than vice versa. (Relationships involving feedback loops may be an exception to this principle without undermining it.)

Thus the distinction is tied up with a strong convention that response (outcome, result, effect, dependent variable) is plotted on the vertical or $y$ axis and the other variable on the horizontal or $x$ axis. It is also tied up with a strong convention in mathematical discussions to use wording such as $y$ is a function of $x$, where the outcome is mentioned first.

However, we are, admittedly, at least in part talking about conventions here, rather than questions on which an inescapable logic can be identified. I was surprised to start hearing the opposite usage of versus about a decade ago. I have no precise recollection of when I first heard versus being used in the sense identified here, but I suspect it was in secondary school (high school) science in the 1960s: as with many such usages, my science teachers tended to use language as was natural to them, rather to reflect on usage or to explain it. This is the way that much scientific language is handed down, despite the thousands and thousands of textbooks.

Also on plotting: There are many exceptions even with scatter and line plots to the convention of response on $y$ axis. In the Earth and environmental sciences, it is common that depth below or height above the surface is on the $y$ axis: what could be more vertical? This is the way that people in those fields think about cores, bores and similar traces below ground or in the atmosphere.

Detail: vs for versus is a contraction, not an abbreviation; many (British) English style guides advise not using a stop or period in such cases.

EDIT 12 April 2018/14 May 2020 Wild and Seber (2000, pp.107-108) in their outstandingly good introductory text explain it in this way: 'In plotting it is conventional to use the vertical axis to represent the response variable $Y$ and the horizontal axis to represent the explanatory variable $X$. (This is what is conventionally meant when we say that "We plot $Y$ versus $X$.")'

Yet in the same chapter they use the opposite convention for versus in captions on p.102 and p.111 and the convention they urge on p.109. See also pp.140, 527, 534, 537.

From this I take three points: (a) There are explanations of the convention I urge in the literature. (b) We are talking conventions, not rules. (c) First-rate authors can be just as inconsistent as anyone else over minor details.

Wild, C.J. and Seber, G.A.F. 2000. Chance Encounters: A First Course in Data Analysis and Inference. New York: John Wiley.

Related Question