Solved – Variance and covariance in the context of deterministic variables

covariancedeterministicrandom variableterminologyvariance

Questions:

  1. Can we talk about:
    variance of a deterministic variable?;
    covariance between a deterministic variable and a stochastic variable?;
    covariance between two deterministic variables?
  2. Are these concepts well defined in sample?; in population?

Motivation
Take a simple regression

$$y = \beta_0 + \beta_1 x + \varepsilon.$$

Suppose the regressor $x$ is stochastic. The OLS estimate of $\beta_1$ will be

$$\hat{\beta}_1=\frac{\widehat{\text{Cov}}(x,y)}{\widehat{Var}(x)}$$

where hats denote sample counterparts of the population concepts. No problem here.

Now suppose $x$ is deterministic. I am not sure if I can use terms like variance and covariance in this context. Should I exchange $\hat{\beta}_1=\frac{\widehat{\text{Cov}}(x,y)}{\widehat{Var}(x)}$ for something like

$$\hat{\beta}_1=\frac{\frac{1}{n-1}\sum(x_i-\bar{x})(y_i-\bar{y})}{\frac{1}{n-1}\sum(x_i-\bar{x})^2}$$

to be correct? But then again, how meaningful is $\bar{x}$ when $x$ is deterministic? So should I go all the way to

$$\hat{\beta}_1=\frac{\frac{1}{n-1}\sum_{i=1}^n(x_i-\frac{1}{n}\sum_{j=1}^n x_j)(y_i-\frac{1}{n}\sum_{j=1}^n y_j)}{\frac{1}{n-1}\sum(x_i-\frac{1}{n}\sum_{j=1}^n x_j)^2}?$$

I am picking on details here and this may not be too important; my main questions are listed at the top of the post.

Best Answer

All five questions have "yes" answers--but we have to be careful about what they mean.

  1. "Variance of a deterministic variable."

    Let's understand a "deterministic variable" to be a univariate dataset. It's just a bunch of values $X=x_1, x_2, \ldots, x_n$, with no probability model. By definition its variance is

    $$\text{Var}(X) = \frac{1}{n}\sum_{i=1}^n \left(x_i - \bar X\right)^2$$

    where $$\bar X = \frac{1}{n}\sum_{i=1}^n x_i$$ is its mean. There is no justification whatsoever to use $n-1$ instead of $n$ in any of these fractions--and this is never legitimately done--because no estimates are being made.

    We may always think of $X$ as defining a "population." This is the definition of a population variance.

  2. "Covariance between a deterministic variable and a stochastic variable."

    One way to understand this is to assume it refers to a sequence of the form $(x_1, Y_1), (x_2,Y_2), \ldots, (x_n,Y_n)$ where the $x_i$ are numbers and the $Y_i$ are random variables. Then we may define the random variable $$\bar Y = \frac{1}{n}\sum_{i=1}^n Y_i,$$ via which the covariance of $x$ and $Y$ is defined as

    $$\text{Cov}(x,Y) = \frac{1}{n}\sum_{i=1}^n (x_i - \bar x)(Y_i - \bar Y).$$

    It is a linear combination of the $Y_i$ and consequently is itself a random variable. This notation is frequently used as a shorthand in linear regression calculations.

  3. "Covariance between two deterministic variables."

    "Two deterministic variables" can be considered a dataset of ordered pairs $(x_1, y_1), (x_2,y_2), \ldots, (x_n,y_n)$. The covariance can be defined exactly as in (2) and interpreted similarly. In fact, this is a direct consequence of (1): after all, covariances are variances.

  4. "Are these concepts well defined in samples?"

    Because they are well-defined for any dataset, they are well-defined for a sample. Note that similar expressions with $n-1$ in the (outer) denominator are estimators: they are not the sample variance or sample covariance.

  5. "Are these concepts well defined in populations?"

    Because they are well-defined for any dataset, and a population can be considered a dataset (when fully enumerated), they are well-defined for a population.