Solved – Should I include non-linear features in the linear regression model

feature selectionfeature-engineeringmultiple regression

I'm building my first linear regression model with multiple features (predicting house prices in a specific city). After reading up on ways to improve my model, I see people talking about plotting the relationship between the target variable and the features. I then realized that one of my features, the construction year of the house, is kind of "jumpy" which probably messes up the coefficient.

My question: How does one handle features as this one? Drop them? Transform them somehow? Turn them into categorial variables?

Chart below. Y axis is mean house price (in Swedish kronor) per year.

Edit: Added plot of residuals below.

Edi2: Added residual histogram below.

Best Answer

Your residual plot appears normal enough to use linear regression with Ordinary Least Squares (OLS) loss.

The linear in linear regression refers to the OLS loss function:

$$ \hat{Y_i} = \beta_{0} + \beta_{1} X_{i} + \epsilon_i $$

Which is linear in each term. It does not refer to the linearity of the independent variables which are being regressed against the dependent output.

If you are looking for a linear regression-like model that fits a non-linear equation, check out Support Vector Machines (SVM) with a polynomial kernel.

Best Answer

Related Solutions

Multiple Regression – Choosing Variables for a Multiple Linear Regression Model

Solved – combining text and non-text features in a classification model

Related Question