Regression – Difference Between Kernel Linear Regression and Non-Parametric Regression

linearnonparametricregression

A quick perplexity popped up in my mind while reading about non-parametric linear regression.

In linear regression, we model our response $\textbf{y} \sim \mathcal{N}(X\beta, \sigma^2I)$ so basically we try to estimate a linear function of the form

$$f_\beta(\textbf{x}_i) =\textbf{x}_{i,1}\beta_1, \dots, \textbf{x}_{i,p}\beta _p$$

while in non-parametric regression we allow more possibilities for the structure of $f$ and the response is modeled as

$$\textbf{y} \sim \mathcal{N}(f(x), \sigma^2I)$$

with $f$ respecting some smoothness assumptions.

What it's not too clear to me is what is the main difference between kernel linear regression and non-parametric one. It is well known that the word linear in linear regression refers to the parameters, so one typically applies a non-linear feature transformation $\phi : \mathbb{R}^p \rightarrow \mathbb{R}^d$ to the features and then searches for some hyperplane fitting the data (brought in higher dimension by the map $\phi$).

Best Answer

A parametric model has fixed number of parameters, in case of non-parametric model, the number of parameters grows with the size of the data. What follows, with a parametric model we need to make stronger assumptions about the distribution of the data, while in case of the non-parametric model, it is "learned from the data" to greater degree, but the practical differences may be blurry in some cases. That is why models such as Gaussian processes are considered as non-parametric, no matter that they make distributional assumptions and have parameters.

Kernel regression is one of the non-parametric regression models, so it cannot differ from non-parametric models. It is a model that uses kernels to approximate the expected value of the distribution of the data. Other non-parametric models may use different ways of achieving this, for example in case of $k$-NN regression the predicted mean would be just an average of the $k$ closest neighbors of the datapoint.

Related Question