Statistics – How to Understand Regression to the Mean

descriptive statisticslinear regressionregressionstatistics

In my statistics book there is a following question:

In studies dating back over 100 years, it's well established that regression toward the mean occurs between the heights of fathers and the heights of their adult sons. Indicate whether the following statement is true or false: Fathers of tall sons will tend to be taller than their sons.

The regression to the mean suggests that sons of tall fathers will be shorter than their fathers but they would still be above average, so they would be in the group of tall sons. So it seems that tall sons would have even taller fathers (otherwise there would be no regression to the mean effect), so the answer to the question should be "True" in my opinion, but according to the solution sheet it should be "False". What is the error in my reasoning and what is the valid way to answer the question?

Thank you in advance for your help.

Best Answer

Francis Galton was the first user of "regression" in this sense. If you consider his original height data as shown in his 1875 chart below (taken from Wikipedia), it may be clearer

Taking tall parents as the top two quarters of the chart, you see how the average heights of their children are pulled to the left of the major axis, i.e. on average the children of tall parents are shorter than their parents but taller than average children (Galton illustrated this with what he called the locus of horizontal tangential points)

But taking tall children as the two right-hand quarters of the chart, you see how the average heights of their parents are pulled to the below of the major axis, i.e. on average the parents of tall children are shorter than their children but taller than average parents (Galton illustrated this with what he called the locus of vertical tangential points)

enter image description here

Related Question