[Math] The VC dimension of convex d-gons

combinatorial-geometrycombinatorial-proofsmachine learning

The VC dimension of convex $d$-gons is $2d+1$. To show that, I can prove the lower bound is $2d+1$. however, I don't know how to prove the upper bound in a rigorous way.

  1. For low bound, I construct a set of $2d+1$ points, they are all lie on a circle. For a labeling, there are two cases, 1) number of positive labels larger than number of negative labels; 2) number of negative labels larger than number of positive labels. For case 1, using the positive points as the vertices of polygon. For case 2, using the tangent lines of circle on positive points as the edges of polygon. Then this set of points is shattered by convex $d$-gon.
  2. For upper bound, My intuitive idea is, for any set of $2d+2$ points and they are labeled as $d+1$ positive, $d+1$ negative. If all the positive points lie on a polygon, then there is at least a pair of points are co-linear, but this is not the case in our assumption, that is "for any set of $2d+2$ points".

I thought the proof of upper bound is not rigorous enough, so I need some advice and help to make it complete.

Best Answer

I'll assume that you mean the one-sided classifiers which assign $+$ to every point inside the (closed) polygon and $-$ outside.

Note that the proof idea you suggest in your question is a little bit flawed - 1. shattered sets are allowed to be 'special,' since the VC dimension is defined as the largest set of shattered points, and not the largest set of shattered points in general position. For example, in your lower bound, you arranged points on a circle! 2. every pair of points is collinear - you can draw the line joining them! Collinearity is only a non-trivial property for $\ge 3$ distinct points.


For the proof - consider any collection of $2d + 2$ points. There are two cases:

Suppose the points do not lie on a convex $2d + 2$-gon. Then there must be some point $p$ that lies in the convex hull of the remaining points. But then any polygon that contains the rest of the points must also contain $p,$ and so the labelling $p = -,$ everything else = $+$ cannot be achieved.

This leaves the case when the points do lie on a convex $2d+2$-gon. For this, consider the alternating labelling. Travelling around the polygon, we will have $d+1$ instances of the pattern $+ - +$ (in order). But each such pattern requires a line to separate, and the convexity of the arrangement ensures that no line will separate two patterns. Thus, we need at least $d+1$ lines to achieve this labeling - ergo, we cannot do it with $d$ lines.

Related Question