Machine Learning – Linear vs. Nonlinear Algorithms

cartlogisticmachine learningneural networksregression

Three linear machine learning algorithms: Linear Regression, Logistic Regression and Linear Discriminant Analysis.

Five nonlinear algorithms: Classification and Regression Trees, Naive Bayes, K-Nearest Neighbors, Learning Vector Quantization and Support Vector Machines.

Can someone please explain for each of these algorithms specifically why are they linear or nonlinear?

Also what would a neural network be and why?

Best Answer

To start, you are mixing classification and regression here, which complicates the answer a bit, but here is the extremely short version: For classification, the model is linear if you can plot all the n features in n-dimensional space, and there is a (n-1) dimensional "line" (or plane, or hyperplane), that separates (or mostly separates) different classes. So e.g., plot height and weight on x and y, and draw a straight line where most men are on one side and most women are on the other.

In regression, a linear model means that if you plotted all the features PLUS the outcome (numeric) variable, there is a line (or hyperplane) that roughly estimates the outcome. Think the standard line-of-best fit picture, e.g., predicting weight from height.

All other models are "non linear". This has two flavors. First, you have the same basic construct, but where the "line" doesn't have to be straight. In trees, the discriminating line is a stair-step shape. E.g., If you are over 6ft and over 250lb, there is a 90-whatever percent chance you are male... Neural nets and several other algos are similar, but with potentially very complex/curvy boundaries or "lines of best fit".

The second flavor of non linear models are non-parametric. K-nearest-neighbors is an example of this. It doesn't look for a discriminating line/curve at all, and instead just looks around at the classes of its nearest neighbors.

Hope that helps!