Solved – A modeling technique combining $k$ nearest neighbors and multiple linear regression

k nearest neighbourmodelingmultiple regressionreferences

I have been modeling data using a hybrid $k$-nearest neighbors (kNN) and multiple linear regression (MLR) and have found the technique to be (at least with my data) much more accurate than either method alone.
In order to describe the technique, I will quickly describe kNN and MLR.

In both cases, we are given $n$ observations, say $(x_1, y_1, z_1),…,(x_n, y_n, z_n)$ where $x_i$ and $y_i$ are independent variables (not random) and $z_n$ is a dependent variable subject to some random noise. We want to predict the value of $z$ at some (unobserved) point $(x,y)$.

kNN regression:

Find the $k$ nearest observations to $(x,y)$ and average the corresponding $z$ values. Specifically, if the $k$ closes points to $(x,y)$ are $(x_1, y_1),…,(x_k, y_k)$ then we predict $z = \frac{1}{k}\sum_{i=1}^{k} z_{k}$.

MLR:

We predict $z$ by building a best fit surface (using least squares) and use the equation of the surface to predict $z$. E.g. we may want to model the surface with a degree 2 polynomial (in two variables), i.e.
$$z = \beta_1 + \beta_2 x + \beta_3 y + \beta_4 x^2 + \beta_5 xy + \beta_6 y^2.$$
In that case, we build matrices $X$ where the $i$th row of $X$ is $X_i = \begin{bmatrix} 1 & x_i & y_i & x_i^2 & x_i y_i & y_i^2\end{bmatrix}$ and $Z$ (a column matrix of the $z_i$ values), both matrices have $n$ rows. Then, we determine the coefficients $\beta_i$ by solving the normal equations $\beta = (X^{T}X)^{-1}X^{T}Z$.

Hybrid Method:

One can view kNN method as building a degree 0 polynomial (constant) using only the $k$ nearest data points. The technique that I've used is to build a MLR model from the $k$ nearest neighbors (without the constraint that the polynomial be degree 0). We also don't require that the same model be used for every region–a region being a set of points with the same $k$ nearest neighbors. We use $k$-fold cross validation to determine which polynomial degree is optimal–when $k$ is small, of course the degree of the polynomial is small. This method feels like a natural relaxation/generalization of traditional kNN.

Question: Does this hybrid approach have a name? Is it used frequently? If so, I would love any references that you can provide. I'm in the process of writing a paper that uses this modeling technique and I'd like to call it by the "correct" name (and cite original authors) or be able to honestly claim that it is novel.

Additional Notes:

The surface that is constructed with this method is akin to splines–it is a piecewise polynomial model. However, it is different in that it is not differentiable (and in most cases) probably not even continuous.

Best Answer

There was a paper

https://projecteuclid.org/euclid.aos/1176325632

Ann. Statist. Volume 22, Number 3 (1994), 1346-1370. "Multivariate Locally Weighted Least Squares Regression" by D. Ruppert and M. P. Wand

discussing a multiple regression where the weights on observations are not 0 or 1 as in your method but vary with the distance from the case you are trying to predict. Multivariate local regression is also discussed in section 2.1.2 of the book "Local Regression and Likelihood" (1999) by Clive Loader. The R package locfit implements univariate and multivariate local regression.

Related Question