Solved – Locally weighted regression VS kernel linear regression

k nearest neighbourkernel trickkernel-smoothingweighted-regression

I am trying to make it clear the relationship of the listed three methods.

According to my understanding kernel regression means : the weight vector W lies in the space spanned by training data.

$$ \alpha =(\mathbf{X}\mathbf{X}^\intercal+\lambda I)^{-1}y $$
$$ g(\textbf{x})=\textbf{x}^\intercal\textbf{w}=\textbf{x}^\intercal\mathbf{X}^\intercal\alpha=\sum\limits_{i=1}^m \alpha_{i}<\textbf{x},\textbf{x}_{i}> $$

My problem here is, is this one already the locally weighted regression?I can get the intuition that the nearer the input vector to a training vector, the more weight will be assigned. Does this already mean "locally weighted"?I mean I know the kernel tricks here, but I do not know whether locally weighted methods shall always have a defined kernel?Is there any other method for locally weighted model other than kernel methods?If it is true(I am not sure..), does one certain type of locally weighted model correspond to one particular kind of kernel function?(like the locally weighted polynomial regression http://water.columbia.edu/files/2011/11/Lall2006Locally.pdf) .I see kernel methods just kind of add time-space dependency to certain existing models. But I do not know exactly how existing model shall correspond to the kernel part.

Many thanks!

Best Answer

Here's how I understand the distinction between the two methods (don't know what third method you're referring to - perhaps, locally weighted polynomial regression due to the linked paper).

Locally weighted regression is a general non-parametric approach, based on linear and non-linear least squares regression. Kernel linear regression is IMHO essentially an adaptation (variant) of a general locally weighted regression in the context of kernel smoothing. It seems that the main advantage of kernel linear regression is that it automatically eliminates the domain boundaries bias, associated with locally weighted approach (Hastie, Tibshirani & Friedman, 2009; for that as well as a general overview, see sections 6.1-6.3, pp. 192-201). This phenomenon is called automatic kernel carpentry (Hastie & Loader, 1993; Hastie et al., 2009; Müller, 1993). More details on locally weighted regression can be found in the paper by Ruppert and Wand (1994).

Due to different presentation style, some other information on the topic might also be helpful. For example this page -link dead, now it's this book, Chapter 20.2 on linear smoothing, this class notes presentation slides document on kernel methods, this class notes page on local learning approaches. I also like this blog post and this blog post, as they are relevant and nicely blend theory with examples in R and Python, correspondingly.

References

Hastie, T., & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science, 8(2), 120-143. Retrieved from http://projecteuclid.org/download/pdf_1/euclid.ss/1177011002

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York: Springer-Verlag. Retrieved from http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

Müller, H.-G. (1993). [Local Regression: Automatic Kernel Carpentry]: Comment. Statistical Science, 8(2), 134-139.

Ruppert, D., & Wand, M. (1994). Multivariate locally weighted least-squares regression. The Annals of Statistics, 22(3), 1346–1370. Retrieved from http://projecteuclid.org/download/pdf_1/euclid.aos/1176325632

Related Question