Convergence of $\hat \beta_N – \hat \beta_{N-1}$ where $\hat \beta_N$ is the least squares solution of $Y_N = X_N\beta_N + \varepsilon_N$

asymptoticsdata analysisprobability theoryregressionstatistics

Suppose we are given a set of random observations $\{y_i,x_{i1},\dots,x_{ip}\}_{i=1}^N$. Based on these observations, we can form the multiple linear regression model in matrix form
$$
Y_N = X_N\beta + \varepsilon_N,
$$

where the error, conditioned on $X$, is normally distributed with mean zero and finite variance. I have used the subscript $N$ to explicitly indicate that we have used $N$ observations.

When we solve this model with least squares we find the estimated regression coefficient vector $\hat \beta_N$. We could also leave out the last observation and solve the model
$$
Y_{N-1} = X_{N-1}\beta + \varepsilon_{N-1},
$$

to find the estimated regression coefficients $\hat \beta_{N-1}$.

The explicit expression for the estimated regression coefficients is well-known. We have
$$
\begin{align}
\hat \beta_N &= \beta_N + (X_N^TX_N)^{-1}X^T \varepsilon_N, \\
\hat \beta_{N-1} &= \beta_{N-1} + (X_{N-1}^TX_{N-1})^{-1}X^T \varepsilon_{N-1}.
\end{align}
$$

As $N$ gets large intuitively it seems there will be negligible difference between the estmiated regression coefficients $\hat \beta_N$ ad $\hat \beta_{N-1}$. But can we quantify this precisely? Is it possible to show that $E(N) := \hat \beta_N – \hat \beta_{N-1}$ converges to zero in probability or almost surely as $N \to 0$? And in particular, how fast it converges to zero?

Best Answer

(From the discussion on MO) I guess that the subscript $N$ in $\beta_N$ is a lapse.

It is well know that, conditionally on $X_N$, $\hat \beta_N$ is unbiased and $\mathrm{Cov}(\hat \beta_N) = \sigma^2 \big(X_N^T X_N\big)^{-1}$.

Now in order to have some convergence, one naturally needs additional (to the usual Gauss–Markov) assumptions. Say, if $x_i$ are iid and square integrable, then $$ X_N^T X_N = \sum_{i=1}^N |x_i|^2 \sim n\,\mathrm{E}[|x_1|^2], n\to\infty, $$ almost surely. As a result, $\hat \beta_n \overset{\mathrm P}\longrightarrow \beta$, $n\to\infty$, moreover, $\sqrt{n} |\hat\beta_n - \beta|$ is bounded in probability. (So the convergence rate is loosely $1/\sqrt{n}$.)

Related Question