Solved – Determine weights in weighted least squares regression

generalized-least-squaresleast squaresregressionweighted-regression

Assume we have a cross-section of $N$ stocks. $Y_i$ is an sample variance estimate of stock returns for stock $i$. This sample variance is estimated using $T_i$ number of observations. All $T_i$ are not necessarily equal, i.e. the sample size for $Y$ estimation differ for i = 1,2,.., N.

Now I want to run a cross-sectional weighted least squares regression:

$Y_i = \beta X_i + \epsilon_i$

What is the best choice of weights here, such that the weights are based on $T_i$ for each $Y_i$. In other words, I want to assign a smaller weight to stock $i$ if $T_i$ is small.

Best Answer

I don't think there's a single optimal weight scheme here. I'd try first $w_i=\frac{NT_i}{\sum_iT_i}$. This way $\sum_iw_i=N$ and if $T_i=T_j\to w_i=1$, nice qualities.

Related Solutions

Solved – How to handle very different weights in a least squares fit

You can try applying a function to the weight. Reasonable choices would be a log or sigmoid.

Solved – Interpretation of weights in non-linear least squares regression

The weights should equal the counts, because those will be inversely proportional to the variances of the errors. Specifically, the model for the data $(x_i, y_i, n_i)$ is

$$y_i \sim \lambda \Phi((\log(x_i) - \mu)/\sigma + \varepsilon_i$$

with $\mu, \sigma \gt 0,$ and $\lambda \gt 0$ the parameters and $\varepsilon_i$ are independent random variables with zero means and variances

$$\text{Var}(\varepsilon(i)) = \sigma^2 / n_i$$

where $n_i$ are the counts.

The fit to the logarithm of $x$ is visually ok:

In this figure the x-axis is on a logarithmic scale, the point symbols have areas proportional to the counts (so that large circles will have more influence in the fitting than small ones), and the red line is a least-squares fit. It is clear the model is not really appropriate: the residuals for smaller values of $y$ tend to be small, regardless of the counts. Possibly the sum of squares of relative errors should be minimized to obtain an appropriate fit.

It is evident that the fit is poor for the largest $x$, but those also have small counts.

The R code with (my version of) the data and the fitting and plotting procedures follows.

y <- c(1, 1, 2, 1, 2, 1, 3, 4, 22, 30, 44, 58, 68, 69, 
       71, 72, 75, 72, 80, 78, 87, 86, 80, 82, 92, 90, 85, 61, 38, 36) / 100
x <- ceiling(exp(seq(log(20), log(500), length.out=length(y))))
counts <- c( 10, 3, 17, 20, 38, 31, 44, 55, 58, 68, 77, 
             82, 86, 82, 77, 75, 70, 65, 68, 51, 47, 41, 38, 30, 22, 14, 9, 4, 2, 1)
#
# The least-squares criterion.
# theta[1] is a location, theta[2] an x-scale, and theta[3] a y-scale.
#
f <- function(theta, x=x, y=y, n=counts) 
  sum(n * (y - pnorm(x, theta[1], theta[2]) * theta[3])^2) / sum(n)
#
# Perform a count-weighted least-squares fit.
#
xi = log(x)
fit <- optim(c(median(xi), sd(xi), max(y) * sd(xi)), f, x=xi, y=y, n=counts)
#
# Plot the result.
#
par(mfrow=c(1,1))
plot(x, y, log="x", xlog=TRUE, pch=19, col="Gray", cex=sqrt(counts/12))
points(x, y, cex=sqrt(counts/10))
curve(fit$par[3] * pnorm(log(x), fit$par[1], fit$par[2]), 
          from=10, to=1000, col="Red", add=TRUE)

Best Answer

Related Solutions

Solved – How to handle very different weights in a least squares fit

Solved – Interpretation of weights in non-linear least squares regression

Related Question