Terminology – Mathematical Term for ‘De-Meaning’ or ‘Differencing the Mean’

inferencemeanterminology

In a standard regression literature, the following terms are used almost interchangeably and are used also loosely:

  1. "De-meaning the equation gives…"
  2. "Differencing the mean of the outcome eliminates…."
  3. "The mean-difference provides…"

Is there a rigorous and unequivocal way to define these terms using the conditional expectation from probability theory?

Best Answer

It is hard to comment without context. Many terms may be ambiguous or there may be different procedures and methodologies of doing things that at first sight may be the same, but because of the "technical details" are not. When reading terms like this, they should always be accompanied by the definitions of the terms and the actual methodology that was used. If they are not, it is a guessing game.

Referring to the quotes, the usual meaning of the three phrases would be different.

1. "De-meaning the equation gives..."

De-meaning usually means subtracting the mean from all the values. If the mean is

$$ \bar x = \frac{1}{N} \sum_{i=1}^N x_i $$

then de-meaning is the operation that produces $x_i' = x_i - \bar x$ for all observations $i=1 \dots N$.

2. "Differencing the mean of the outcome eliminates...."

This one might mean different things. For example, similar language was used here to describe the difference-in-differences method:

The DID strategy relies on two differences. The first is a difference across time periods. Separately for the treatment group and the control group, we compute the difference of the outcome mean before and after the treatment. This across-time difference eliminates time-invariant unobserved group characteristics that confound the effect of the treatment on the treated group. But eliminating group-invariant unobserved characteristics is not enough to identify an effect.
[...] The ATET is then consistently estimated by differencing the mean outcome for the treatment and control groups over time to eliminate time-invariant unobserved characteristics and also differencing the mean outcome of these groups to eliminate time-varying unobserved effects common to both groups. [...]

Here it is about the difference between means.

3. "The mean-difference provides..."

"Mean-difference" would usually mean that you calculate the mean of differences. For example, you have the $z_i = x_i - y_i$ datapoints for $i = 1 \dots N$ and calculate mean-difference, i.e. the mean of $z_i$'s.

So the terms are not interchangeable.