Solved – Will non-linear data always become linear in high dimension

data transformationgeneralized linear modellogisticsvm

I was reading the Hands on ML book and I'm on the SVM and Logistic Regression chapters. I started looking up more on these algorithms and apparently they are "linear" classifiers i.e the decision boundary is linear (The classifier needs the inputs to be linearly separable.)

Now in the book it is mentioned that since in most of the cases data is not linearly separable, we have to increase the dimensions of the features to make it linearly separable.

But is it always true that there is some transformation to convert every non-linearly separable data set into a linearly separable one? If not, what would be an example of such a data set where this is impossible?

Best Answer

In theory, it is always possible to make any arbitrary dataset linearly separable in higher dimensions. In fact, you ideally only need to add one additional dimension to do so, which is a dimension that represents your true class labels. No matter what the data looks like in the other dimensions, if you have a way to add a dimension that represents the true class values, you can linearly separate on that dimension and perfectly recover the true classes. The only time it's impossible to add a dimension like this is if you have two identical samples with different classes, since there will be no deterministic way to map them to different classes given only the feature data.

Otherwise, this mapping is always possible in theory, but in practice, it's usually difficult to come up with a way to generate that extra dimension of class labels which is generalizable and not overfit. A simple transformation is to look at all your datapoints, and just assign the true class as the value on your new dimension, but this method completely fails to generalize to points not in the original data. It's trivial to overfit the mapping to linearly separate the training data, but it's much more difficult to find a mapping that will accurately separate data you didn't train on.

Best Answer

Related Solutions

Solved – SVM: Number of support vectors

Solved – How does the shape of a decision boundary in relate between the original and kernel feature space

Related Question