Solved – How to know whether the data is linearly separable

data mininglogisticmachine learningseparationsvm

The data has many features (e.g. 100) and the number of instances is like 100,000. The data is sparse. I want to fit the data using logistic regression or svm. How do I know whether features are linear or non-linear so that I can use kernel trick if non-linear?

Best Answer

There are several methods to find whether the data is linearly separable, some of them are highlighted in this paper (1). With assumption of two classes in the dataset, following are few methods to find whether they are linearly separable:

  1. Linear programming: Defines an objective function subjected to constraints that satisfy linear separability. You can find detail about implementation here.

  2. Perceptron method: A perceptron is guaranteed to converge if the data is linearly separable.

  3. Quadratic programming: Quadratic programming optimisation objective function can be defined with constraint as in SVM.

  4. Computational geometry: If one can find two disjoint convex hulls then the data is linearly separable

  5. Clustering method: If one can find two clusters with cluster purity of 100% using some clustering methods such as k-means, then the data is linearly separable.

    (1): Elizondo, D., "The linear separability problem: some testing methods," in Neural Networks, IEEE Transactions on , vol.17, no.2, pp.330-344, March 2006 doi: 10.1109/TNN.2005.860871

Related Question