Solved – Intuition for “weights” in simple linear regression

intuitionregressionweighted-regression

Suppose we have data $\{x_i,y_i\}_{i=1}^n$ where $x_i \in \mathbb{R}$ and $y_i \in \mathbb{R}$ and we model
$$
y_i=\beta x_i + \varepsilon_i
$$
The ordinary least squares estimate of $\beta$ is
$$
\widehat \beta = \sum_{i=1}^n w_i y_i
$$
where $w_i={x_i}/{\sum_{j=1}^nx_j^2}$ can be viewed as "weights" on each $y_i$.

I've been thinking about what these "weights" mean and why they make sense but it seems to put more weight on larger values of $x_i$, which I don't quite understand why this makes sense.

Can someone help me with the intuition for why the "weights" on $y_i$ make sense? Thanks.

Note: I'm not interested in the derivation of $\widehat \beta$ or that these weights happen to minimize the sum of least squares. I'm interested in the intuition i.e. how would you explain this to the layman without math.

Best Answer

The problem of weights in regression is a really vaste domain.

The traditional problem is to minimize $$SSQ=\sum_{i=1}^n \Big(y_i^{(calc)}-y_i^{(exp)}\Big)^2$$ and, as you know, this gives a large influence to the largest values of the $y_i^{(exp)}$. This corresponds to the sum of squares of the absolute errors of the $y$'s $(w_i=1)$.

If instead you consider $$SSQ=\sum_{i=1}^n \Big(\frac{y_i^{(calc)}-y_i^{(exp)}}{y_i^{(exp)}}\Big)^2$$ This corresponds to the sum of squares of the relative errors of the $y$'s $(w_i=\frac 1 {y_i^2})$.

But there is another situation where the weights can be important. Suppose that the model is $y=Ae^{Bx}$ which is nonlinear. You can linearize it taking logarithms $\log(y)=\alpha+\beta x$ but ordinary least squares can lead to very different results compared to nonlinear regression since the transform gives greater weights to small $y$ values. For this very specific case, it is been found that $$SSQ=\sum_{i=1}^n y_i\Big(\alpha+\beta x_i-\log(y_i)\Big)^2$$ is very acceptable.

I suggest you have a look at http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd143.htm

Related Solutions

Solved – Simple linear regression with a random predictor

The answer to 1 is no which makes the answers to all the others not applicable.

Let me start with your last equation: \begin{align} y_i = \alpha + \beta w_i + \epsilon_i \end{align}

Now, let's assume that your earlier equations for $y$ and $w$ are valid classical linear regression models, so that $Cov(x,\epsilon_1)=0$ and $Cov(x,\epsilon_2)=0$. I'm not sure what SLR stands for---Simple Linear Regression?

Anyway, now let's calculate $Cov(w,\epsilon)$ in order to verify whether your new equation is part of a valid classical linear regression model (recall that we need this to be zero): \begin{align} Cov(w,\epsilon) &= Cov(w,\epsilon_1-\frac{\beta_1}{\beta_2}\epsilon_2) \\ \strut \\ &= Cov(w,\epsilon_1) - \frac{\beta_1}{\beta_2}Cov(w,\epsilon_2) \\ \strut \\ &= Cov(\epsilon_2,\epsilon_1) - \frac{\beta_1}{\beta_2}V(\epsilon_2) \end{align}

The second term is not zero unless $\beta_1=0$, and that would make the example pretty silly. Even the first term is not likely to be zero in most physical applications. For that term to be zero, you would have to make the additional assumption that the errors made by the two instruments were completely uncorrelated. You could get wildly lucky (in a stopped-clock-is-right-twice-a-day kind of sense) and the two terms could magically cancel out, but there is no systematic tendency of the two terms to cancel out.

The bias in estimating $\beta$ will be: \begin{align} \frac{Cov(\epsilon_2,\epsilon_1) - \frac{\beta_1}{\beta_2}V(\epsilon_2)}{V(w)} \end{align}

Below, I attach a bit of R code which makes a toy monte carlo to demonstrate the effect. The theoretical bias in the monte carlo is -0.25 and the answer we get in the monte carlo is too low by 0.23. So, demonstrates the point pretty well.

In general, even if you can't see how to evaluate the bias in an example like this, you can always run a little monte carlo to see what is going on. This is one of the great things about statistical software languages. Monte Carlo simulations are amazingly powerful tools to give you feedback as to whether your ideas are really good or really not.

# This program written in response to a Cross Validated question
# http://stats.stackexchange.com/questions/74527/simple-linear-regression-with-a-random-predictor

# The program is a toy monte carlo.
# It generates a "true" but unobservable-to-the-analyst physical state x.
# Then it generates two measurements of that state from different instruments.
# Then it regresses one measurement on the other.

set.seed(12344321)

# True state, 1000 runs of the experiment
x <- rnorm(1000)

# Set the various parameters of the monte carlo
# Play with these for fun and profit:

alpha_1 <- 0
alpha_2 <- 0
beta_1  <- 1
beta_2  <- 1
stddev_e1 <- 1
stddev_e2 <- 1
corr_e1e2 <- 0.5

# Fallible measurements
e_1 <- stddev_e1*rnorm(1000)
e_2 <- stddev_e2*(corr_e1e2*e_1+sqrt(1-corr_e1e2^2)*rnorm(1000))
y <- alpha_1 + beta_1*x + e_1
w <- alpha_2 + beta_2*x + e_2

var(data.frame(e_1,e_2))
var(data.frame(x,w,y))

lm(y~x)
lm(w~x)

# By the bias formula in the answer, this regression should have a bias of
# -0.25 = (0.5-1*1)/2.  That is, the coefficient should not be close to 1,
# the correct value of beta_1/beta_2.  Instead, it should be close 
# to 0.75 - 1-0.25

lm(y~w)

Simple Linear Regression – Are Estimates of Intercept and Slope Independent?

Go to the same site on the following sub-page:

https://onlinecourses.science.psu.edu/stat414/node/278

You will see more clearly that they specify the simple linear regression model with the regressor centered on its sample mean. And this explains why they subsequently say that $\hat \alpha$ and $\hat \beta$ are independent.

For the case when the coefficients are estimated with a regressor that is not centered, their covariance is

$$\text{Cov}(\hat \alpha,\hat \beta) = -\sigma^2(\bar x/S_{xx}), \;\;S_{xx} = \sum (x_i^2-\bar x^2) $$

So you see that if we use a regressor centered on $\bar x$, call it $\tilde x$, the above covariance expression will use the sample mean of the centered regressor, $\tilde {\bar x}$, which will be zero, and so it, too, will be zero, and the coefficient estimators will be independent.

This post, contains more on simple linear regression OLS algebra.

Best Answer

Related Solutions

Solved – Simple linear regression with a random predictor

Simple Linear Regression – Are Estimates of Intercept and Slope Independent?

Related Question