Solved – Is it possible to perform a regression where you have an unknown / unknowable feature variable

machine learningregressionstochastic-processes

Is it possible to perform a regression where you have an unknown / unknowable feature variable?

Say I have $y_n = a_0 + a_1 x_1 + a_2 x_2 + a_3 x_3$ but I do not / cannot measure the value of the feature variable $x_3$. Can I still perform a regression to ascertain the coefficients $a_i$?

How about if I have some knowledge of the statistics of how $x_3$ is distributed? If I know that $x_3$ is drawn from a Gaussian distribution $\mathcal{N}(0, \sigma^2)$, with known $\sigma$ does this allow me to perform the regression to ascertain the values of $a_i$?

Best Answer

The complete formula for a linear model is (in quasi matrix form)

$$Y=\beta X+\epsilon$$

So we have multiple coefficents for the variables that we are controlling for, and then we have $\epsilon$, which is everything else which we did not explain with our included variables.

In this error term belong all the variables which we did not consider, either because we do not have information for them or because we simply do not know of them (random deviation).

So there is just no way for you to know what in this term belongs to what unknown term.

Related Question