Solved – covariate

descriptive statisticspredictorterminology

I'm confused with this term: covariate. What is it? Is is just the observed outcomes of some random variables that contain information that could help us enhance our prediction of another random variable that we haven't observed yet? Why is it named so?

Also there seems to be another: independent variable. Independent of what? Why is it named so?

Best Answer

From Wikipedia:

Depending on the context, an independent variable is sometimes called a "predictor variable", regressor, covariate, "controlled variable", "manipulated variable", "explanatory variable", exposure variable (see reliability theory), "risk factor" (see medical statistics), "feature" (in machine learning and pattern recognition) or "input variable." In econometrics, the term "control variable" is usually used instead of "covariate".

Answering (some of) your questions:

  • Assume that you are solving linear regression, where you are trying to find a relation $\textbf{y} = f(\textbf{X})$. In this case, $\textbf{X}$ are independent variables and $\textbf{y}$ is the dependent variable.
  • Typically, $\textbf{X}$ consists of multiple variables which may have some relations between them, i.e. they "co-vary" -- hence the term "covariate".

Let's take a concrete example. Suppose you wish to predict the price of a house in a neighborhood, $\textbf{y}$ using the following "co-variates", $\textbf{X}$:

  • Width, $x_1$
  • Breadth, $x_2$
  • Number of floors, $x_3$
  • Area of the house, $x_4$
  • Distance to downtown, $x_5$
  • Distance to hospital, $x_6$

For a linear regression problem, $\textbf{y} = f(\textbf{X})$ the price of the house is dependent on all co-variates, i.e. $\textbf{y}$ is dependent on $\textbf{X}$. Do any of the co-variates depend on the price of the house? In other words, is $\textbf{X}$ dependent on $\textbf{y}$? The answer is NO. Hence, $\textbf{X}$ is the independent variable and $\textbf{y}$ is the dependent variable. This encapsulates a cause and effect relationship. If the independent variable changes, its effect is seen on the dependent variable.

Now, are all the co-variates independent of each other? The answer is NO! A better answer is, well it depends!

The area of the house ($x_4$) is dependent on the width ($x_1$), breadth ($x_2$) and the number of floors ($x_3$), whereas, distances to downtown ($x_5$) and hospital ($x_6$) are independent of the area of the house ($x_4$).

Hope that helps!