[Math] Regression Calculation, missing data

regressionstatistics

In a regression calculation for five pairs of observations one pair of values was lost when data were filed. For the regression of $y$ on $x$ the equation was calculated as

$y=2x-0.1$

The four recorded pairs of values are:

$x: 0.1, 0.2, 0.4, 0.3$

$y: 0.1, 0.3, 0.7, 0.4$

Find the missing pair of values, using the following data for the four pairs above: $\sum x=1,\sum x^2 =0.3, \sum xy = 0.47, \sum y =1.5 $

The regression line was introduced before the introduction of the missing pair. The value of $b$ in the $y=a + bx$ equation is $1.9$ for four values of $x$ and $y$ and $a$ remains unaltered.

Further Mathematics Advanced Level, Statistics.

Thanks for help!

Best Answer

HINT :

Before loosing one pair, the linear regression to fit the function $y=ax+b$ (with 5 pairs) leads to the values of $a=2$ and $b=-0.1$ : $$b=\frac{\sum y\sum x^2-\sum x\sum xy}{5\sum x^2-\left(\sum x\right)^2}$$ $$a=\frac{5\sum xy-\sum x\sum y}{5\sum x^2-\left(\sum x\right)^2}$$ Let $(X,Y)$ be the missing pair : $$\sum x=1+X\quad;\quad \sum x^2=0.3+X^2\quad;\quad \sum xy=0.47+XY\quad;\quad \sum y=1.5+Y$$ The $\sum$ continues to be for the 5 pairs. Putting them into the above equations : $$\begin{cases} -0.1=\frac{(1.5+Y)(0.3+X^2)-(1+X)(0.47+XY)}{5(0.3+X^2)-\left(1+X\right)^2}\\ 2=\frac{5(0.47+XY)-(1+X)(1.5+Y)}{5(0.3+X^2)-\left(1+X\right)^2} \end{cases}$$ Solve this system of two equations for the two unknowns $X,Y$.

LATTER ADDIION :

The result of solving is $X=0.3$ , $Y=0.6$

A short way :

One observe that the pairs $(0.1,0.1)$ , $(0.2,0.3)$ , $(0.4,0.7)$ are exactly on the regression line $y=2x-0.1$

On the other hand, the pair $(0.3,0.4)$ is not on the regression line since the point should be $(0.3,0.5)$ to be on the regression line, because $2*0.3-0.1=0.5$

The missing pair $(X,Y)$ must compensate so that $(0.4+Y)/2=0.5 \quad\to\quad Y=0.6$

So, the result is $X=0.3$ , $Y=0.6$

Related Question