Regression – Why Does $\overline{y} = \hat \beta_{0} + \hat \beta_{1} \overline{x}$ Hold in Simple Linear Regression?

expected valuelinear modelregression

Today, once again, I observed that the dependent variable was predicted to be its mean when the independent variable was set to its mean in simple linear regression.

  1. Let $(\hat{y},\hat{x})$ be vectors and $(\overline{y},\overline{x})$ denote their means. Does the equation in the title hold in general for a simple linear regression of $\hat{y}$ on $\hat{x}$?

  2. What are the mathematical reasons for this?

EDIT: The reason I ask is because I read (Willett & Stampfer. Total energy intake: Implications for epidemiologic analyses. Am J Epidemiol 1986;124:17-27) that to adjust intake of a certain nutrient for total caloric intake, one can take the residuals from a simple linear regression with that nutrient as DV and total caloric intake as IV and add the "expected nutrient intake for a person with mean caloric intake" (so the obtained values are not centered at 0 and often negative, which is strange for something that physically should be strictly non-negative).

So the question follows: Why did the authors not instead more simply say one should add the mean nutrient intake?
It seems odd that these two highly distinguished researchers would not be aware of this equivalence.

enter image description here

Best Answer

The point $[E(X), E(Y)]$ does always fall on the least squares regression line when fitting $Y=AX+B$. where by $E(X)$ and $E(Y)$ we mean the sample averages. In your notation it should be $E(Y\vert X=E(X))=E(Y)$. This is an interesting property of the least squares estimate. Given $Y_i=A X_i + B +e_i$ is the model with $i=1,2,...,n$.

The least squares estimates for A and B are obtained by taking partial derivatives of $\sum e_i^2$ with respect to $A$ and $B$ and setting them equal to zero. This leads to two equations in two unknowns and one of the equations reduces to $Y_b=A X_b +B$, where $X_b$ and $Y_b$ are the sample means for $X$ and $Y$ respectively.

To answer the new question from the edit: You do not add the overall mean nutrition intake for every subject because Their total caloric intake is not always at the mean total caloric intake for the individual subject. All we said with the first result was that if your total caloric intake is at the sample mean then the expected nutricianal intake would be at the sample mean. But the authors want to adjust each individual based on his own total caloric intake.

Related Question