Solved – Are N-1 distinct points with two classes Linearly Separable in an N Dimensional Space

machine learning

I have this conjecture whose verification I haven't been able to find online anywhere. If you have N – 1 distinct points that each have one of two classes, can you find a linear decision boundary in an N dimensional space that perfectly separates the classes?

EDITS
Changed N to N-1. If you are in R^3 with the points (1, 0, 0), (2, 0, 0) (3, 0, 0) and you have the colors Red, Blue, Red respectively, you won't be able to draw a plane that separates them.

Best Answer

Short answer

In general this is not true. Consider $N-1$ points in $\mathbb{R}^N$ and a line $d$. Place the $N-1$ points any way you want in $d$. Now, the intersection of any affine hyperplane of $\mathbb{R}^N$ with $d$ will either be empty, a point, or the whole line. Any way, if $N-1 \ge 3$, not all classifications will be possible.

Longer answer

This being said, let's get to the interesting part. It is also always possible to place $N-1$ points in $\mathbb{R}^N$ such that they can be classified in any way you want by an affine hyperplane. In fact, you can place up to $N+1$ points in $\mathbb{R}^N$ and still have that property.

To prove this, we'll explicity place $N+1$ points in $\mathbb{R}^N$ in a particular way and count the number of ways we can classify those points. If we get $2^{N+1}$ ways of classifying them, it will mean that we can make any classification we want. (Make sure you see why.)

I say that the points $$ x_1 = (1^1, \dots, 1^N), x_2 = (2^1, \dots, 2^N), \dots, x_{N+1} = ((N+1)^1, \dots, (N+1)^N) $$ will do the job.


Recall that our classification function is $sign(h(x_i))$, where $h(x_i)$ is the dot product between a weight vector $w=(w_0, w_1, \dots, w_N)$ and our vector $x_i$ augmented with a new coordinate fixed to $1$. That is, for $< , >$ the dot product, $$ h(x_i) = <(w_0, w_1, \dots, w_N), (1, x_{i_1}, \dots, x_{i_N}) > $$ If we let $P$ be a polynomial of $\mathbb{R}[X]$ and say $P = w_0 + w_1 X + \dots + w_N X^N$, then we find $$ h(x_i) = P(i) $$

Now, fix $h(x_1) = P(1) >0$ (you could also say $P(1) < 0$ ). Choose any $d \le N$ and $u_1, \dots u_d \in \{2, 3, \dots, N+1\}$ such that $u_1 < u_2 < \dots < u_d$. We can choose $P$ (that is, choose the weights $w_i$ using polynomial interpolation) such that \begin{array}{l} P(1) > 0, \dots P(u_1 -1 ) >0\\ P(u_1) < 0, \dots P(u_2 -1) < 0\\ P(u_2) > 0, \dots P(u_3 -1 ) >0\\ \vdots\\ (-1)^d P(u_d) >0 , \dots, (-1)^d P(u_N) >0 \end{array}

We thus have an alternance in our classification over the intervals $[1, u_1), [u_1, u_2), \dots, [u_d, N]$.


It's now time to count. For two choices of $d$, the classifications are disjoint. We had two choices for the sign of $P(1)$, and ${N \choose d}$ possibilities for the $u_1, \dots u_d$. Therefore, the we can classify our $N+1$ points in (at least) $$ 2\cdot \sum\limits_{i=0}^{N} {N\choose i} = 2^{N+1} $$ ways. We won!

Generalisation

I went through that long argument in order to get this number : $2\cdot \sum\limits_{i=0}^{N} {N\choose i}$. It turns out that $N$ points in $\mathbb{R}^d$ can be chosen so they can be classified in $$ m(N) := 2\cdot \sum\limits_{i=0}^{d} {N-1\choose i} $$ ways. Furthermore, no $N$ points will have more than $m(N)$ classifications.

You can use the proof we made in order to build $N$ points that can be classified in $m(N)$ ways. I found it is harder to prove that $m(N)$ is also a maximum (I'd be interested in a simple proof, if someone has one.)


Note: I think I got carried away.

Related Question