The answer to 1 is no which makes the answers to all the others not applicable.
Let me start with your last equation:
\begin{align}
y_i = \alpha + \beta w_i + \epsilon_i
\end{align}
Now, let's assume that your earlier equations for $y$ and $w$ are valid classical linear regression models, so that $Cov(x,\epsilon_1)=0$ and $Cov(x,\epsilon_2)=0$. I'm not sure what SLR stands for---Simple Linear Regression?
Anyway, now let's calculate $Cov(w,\epsilon)$ in order to verify whether your new equation is part of a valid classical linear regression model (recall that we need this to be zero):
\begin{align}
Cov(w,\epsilon) &= Cov(w,\epsilon_1-\frac{\beta_1}{\beta_2}\epsilon_2) \\ \strut \\
&= Cov(w,\epsilon_1) - \frac{\beta_1}{\beta_2}Cov(w,\epsilon_2)
\\ \strut \\
&= Cov(\epsilon_2,\epsilon_1) - \frac{\beta_1}{\beta_2}V(\epsilon_2)
\end{align}
The second term is not zero unless $\beta_1=0$, and that would make the example pretty silly. Even the first term is not likely to be zero in most physical applications. For that term to be zero, you would have to make the additional assumption that the errors made by the two instruments were completely uncorrelated. You could get wildly lucky (in a stopped-clock-is-right-twice-a-day kind of sense) and the two terms could magically cancel out, but there is no systematic tendency of the two terms to cancel out.
The bias in estimating $\beta$ will be:
\begin{align}
\frac{Cov(\epsilon_2,\epsilon_1) - \frac{\beta_1}{\beta_2}V(\epsilon_2)}{V(w)}
\end{align}
Below, I attach a bit of R
code which makes a toy monte carlo to demonstrate the effect. The theoretical bias in the monte carlo is -0.25 and the answer we get in the monte carlo is too low by 0.23. So, demonstrates the point pretty well.
In general, even if you can't see how to evaluate the bias in an example like this, you can always run a little monte carlo to see what is going on. This is one of the great things about statistical software languages. Monte Carlo simulations are amazingly powerful tools to give you feedback as to whether your ideas are really good or really not.
# This program written in response to a Cross Validated question
# http://stats.stackexchange.com/questions/74527/simple-linear-regression-with-a-random-predictor
# The program is a toy monte carlo.
# It generates a "true" but unobservable-to-the-analyst physical state x.
# Then it generates two measurements of that state from different instruments.
# Then it regresses one measurement on the other.
set.seed(12344321)
# True state, 1000 runs of the experiment
x <- rnorm(1000)
# Set the various parameters of the monte carlo
# Play with these for fun and profit:
alpha_1 <- 0
alpha_2 <- 0
beta_1 <- 1
beta_2 <- 1
stddev_e1 <- 1
stddev_e2 <- 1
corr_e1e2 <- 0.5
# Fallible measurements
e_1 <- stddev_e1*rnorm(1000)
e_2 <- stddev_e2*(corr_e1e2*e_1+sqrt(1-corr_e1e2^2)*rnorm(1000))
y <- alpha_1 + beta_1*x + e_1
w <- alpha_2 + beta_2*x + e_2
var(data.frame(e_1,e_2))
var(data.frame(x,w,y))
lm(y~x)
lm(w~x)
# By the bias formula in the answer, this regression should have a bias of
# -0.25 = (0.5-1*1)/2. That is, the coefficient should not be close to 1,
# the correct value of beta_1/beta_2. Instead, it should be close
# to 0.75 - 1-0.25
lm(y~w)
If predictor variables in linear regression are dependent, then significance of independent variables is undermined and not-so important predictors might be included in your model.
Suppose you include 2 predictor variables- diet and stress- that are dependent on each other. Your model would be :
weight = diet + stress
Influence of stress on weight gain might be due to amount of diet.
So here, significance of diet is undermined. You might pick up stress as significant variable, when it's actually not.
You can read about multicollinearity to know further.
Best Answer
The quote the OP links to starts with a mistake by referring to the "residuals" while all these assumptions refer to the errors (the residuals are the estimated errors).
Apart from that when we specify a regression equation, we state that as a variable, $Y$ is a function of $X$'s and the error term. It is then natural to say that the distribution of $Y$ will be influenced by the distribution of $X$'s and of the error term, since they determine $Y$ itself.
As a simple example, assume that $Y = a + bX + u$, where $u$ follows a Normal but $X$ follows say, a Gamma distribution, then the distribution of $Y$ cannot be normal, and what it will be will depend on the distribution of $X$ also, and how it "mingles" with the distribution of $u$. Etc.
Even if the regressors are "deterministic", meaning that they cannot be said to follow a statistical distribution, they still affect the parameters of the distribution of $Y$: in the previous example with deterministic regressors, the distribution of $Y$ will be normal with modified mean (but same variance).
In the "conditional expectation function" approach, in principle we consider the joint distribution of $\{Y,X\}$ and the resulting conditional one, and the distribution of the conditional expectation function error springs from these (i.e. here the error is not treated as a separate variable but is defined as $u\equiv Y- E(Y\mid X)$ )
So in all cases, the distribution of $Y$ is influenced by $X$, in one way or the other.