Upper bound on VC-dimension of partitioned class

co.combinatoricslearning-theorymachine learningpartitionsst.statistics

Fix $n,k\in \mathbb{N}_+$.
Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the form
$$
f = \sum_{n=1}^{k_0}\, f_n\, I_{C_n}
$$

where $k_0\in \mathbb{N}$ with $k_0\le k$, $f_1,\dots,f_{k_0}\in \mathcal{H}$, $I_A$ is the indicator function of a measurable set $A\subseteq \mathbb{R}^n$, and $C_1,\dots,C_{k_0}$ be the Voronoi cells corresponding to some distinct $p_1,\dots,p_{k_0}\in \mathbb{R}^n$; i.e. for $i=1,\dots,k_0$ we recursively define
$$
C_i = \hat{C}_i\setminus \bigcup_{j<i} C_j
$$

where for $i=1,\dots,k_0$
$$
\hat{C}_i = \{x\in \mathbb{R}^n:\, \|x-p_i\|= \min_{j=1,\dots,k_0}\,\|x-p_j\|\}.
$$

Note: One may reorder the sets WLOG (breaking ties in a different manner when assigning equidistance points to one of the Voronoi cells).

Update: As pointed out by Gabe Conant, if the $C_i$ are not bonafide partitions, then the VC dimension of this class may degenerate.

Is it true that the VC-dimension of $\mathcal{H}_k$ must be finite (I'm looking for any, possibly loose, but finite bound)?



Note: I define the VC-dimension of a set of real-valued regressors as in 1 – Definition 2.3 in Shen et al. (2023). Namely, for any class $\mathcal{H}\subseteq \mathbb{R}^{\mathbb{R}^n}$ $$
\operatorname{VCdim}(\mathcal{H}):=\operatorname{VC-dim}(\{I_{(0,\infty)}\circ f:\,f\in \mathcal{H}\})
,$$

where $\operatorname{VC-dim}(\{I_{(0,\infty)}\circ f:\,f\in \mathcal{H}\})$ denotes the standard VC dimension for classifiers.; so one considers all binary classifiers obtained by post-composing with the indicator function of the positive real line and considers the classical VC-dimension thereof.

Note, if $\mathcal{H}\subseteq \{0,1\}^{\mathbb{R}^n}$ then this coincides with the classical definition of VC-dimension.

  1. Shen et al. (2023) – Optimal approximation rate of ReLU networks in terms of width and depth

Best Answer

As noted by Iosif Pinelis, we can reformulate the problem in terms of classical set systems in the following way:

Let $\mathcal{H}$ be a set system on $\mathbb{R}^n$ with VC-dimension $d$. Fix $k\geq 1$ and define $\mathcal{H}_k$ to be the collection of all sets of the form

$$ \bigcup_{i=1}^{k_0}H_i\cap C_i $$ where $k_0\leq k$, each $H_i$ is in $\mathcal{H}$, and $(C_1,\ldots,C_{k_0})$ is the Voronoi partition of cells corresponding to some $p_1,\ldots,p_{k_0}$ in $\mathbb{R}^n$.

Theorem. $\mathcal{H}_k$ has finite VC-dimension bounded in terms of $k$, $d$, and $n$.

The proof relies on the following tools.

Fact. The set system of half spaces in $\mathbb{R}^n$ has VC-dimension $n+1$.

Lemma. Suppose $\mathcal{S}_1,\ldots,\mathcal{S}_m$ are set systems on some ground set $X$, and assume each $\mathcal{S}_i$ has VC-dimension at most $\ell$. Consider the set systems: $$ \mathcal{S}_{\cap}=\{S_1\cap\ldots\cap S_m:S_i\in\mathcal{S}_i\} $$ $$ \mathcal{S}_{\cup}=\{S_1\cup\ldots\cup S_m:S_i\in\mathcal{S}_i\} $$ Then $\mathcal{S}_{\cap}$ and $\mathcal{S}_{\cup}$ both have VC-dimension at most $2\ell m\log (3m)$.

The Fact is a standard exercise, and the Lemma is Lemma 3.2.3 in Learnability and the Vapnik-Chervonenkis Dimension by Blumer, Ehrenfeucht, Haussler, and Warmuth. The proof also is sketched in my answer here.

Proof of the Theorem. Let $\mathcal{C}$ be the collection consisting of the empty set and all subsets of $\mathbb{R}^n$ obtained as intersections of at most $k-1$ half spaces. Then every cell $C_i$ obtained as above is in $\mathcal{C}$. Moreover, $\mathcal{C}$ has VC-dimension at most $d_1=2(n+1)(k-1)\log(3k-3)$ by the Lemma and the Fact.

Next, let $\mathcal{H}'=\{H\cap C:H\in \mathcal{H},~C\in \mathcal{C}\}$. Applying the Lemma with $m=2$, $\mathcal{S}_1=\mathcal{H}$, and $\mathcal{S}_2=\mathcal{C}$, we conclude $\mathcal{H}'$ has VC-dimension at most $d_2=(4\log 6)\max\{d,d_1\}$.

Finally, let $\mathcal{H}''=\{H_1\cup\ldots\cup H_k:H_i\in \mathcal{H}'\}$. Then $\mathcal{H}_k\subseteq \mathcal{H}''$ and $\mathcal{H}''$ has VC-dimension at most $2d_2k\log(3k)$ by the Lemma.


Some additional remarks.

  1. The bound is certainly not tight since there is quite a bit of inefficiency in the proof; especially since $\mathcal{H}''$ is much bigger than $\mathcal{H}_k$. In particular, the proof is not actually using the partition feature of the $(C_1,\ldots,C_k)$ sequences (this is only used in checking that the Theorem accurately represents the original question about functions).

  2. In response to ABIM's comments, the bound cannot be improved to $kd$ in general. See Example 1 below, which also shows that the bound must depend on $n$.

  3. If one does not refine the original sequences $(\hat{C}_1,\ldots,\hat{C}_k)$ into partitions, then the Theorem is still true, but it does not accurately capture the original question. In fact, in this case the VC-dimension of the set systems from the question can be infinite, even when one just filters over a single sequence. See Example 2 below.

  4. Example 2 is not very satisfying since it is just exploiting the signal loss in the definition of VC-dimension for a class of real-valued functions from the cited reference (which I find a bit peculiar, but perhaps I'm not the right one to judge). There are different notions of VC-dimension for sets of functions which seem more likely to preserve finiteness in this situation (namely, not requiring the cells to partition and not even requiring the points to be pairwise distinct).

Example 1. Consider $\mathbb{R}^n$. Let $\mathcal{H}=\{\emptyset,\mathbb{R}^n\}$, which has VC-dimension $1$. Then $\mathcal{H}_k$ consists of all sets obtained as a union of some subset of a Veronoi sequence $(C_1,\ldots,C_{k_0})$ for some $k_0\leq k$. So, in particular, $\mathcal{H}_2$ is the set system of all half-spaces in $\mathcal{R}^n$ (together with $\emptyset$ and $\mathbb{R}^n$). By the Fact, $\mathcal{H}_2$ has VC-dimension $n+1$.

In the next example, I will let $\hat{\mathcal{H}}_k$ denote the set system of functions as defined in the original question, but using the non-partitioned sequences $(\hat{C}_1,\ldots,\hat{C}_k)$.

Example 2. Consider $\mathbb{R}^2$. Define $$ \mathcal{H}=\{I_S+1:S\subseteq\mathbb{R}^2\}\cup\{-I_{\mathbb{R}^2}\}. $$ Then VCdim$(\mathcal{H})=1$ since $I_{(0,\infty)}\circ f$ is either $I_{\mathbb{R}^2}$ or $I_\emptyset$ for any $f\in \mathcal{H}$. Now choose the sequence $(\hat{C}_1,\hat{C}_2)$ obtained from the points $(0,1)$ and $(0,-1)$. So $\hat{C}_1$ is the closed upper-half plane, $\hat{C}_2$ is the closed lower-half plane, and their intersection is the $x$-axis. Define $$ \mathcal{H}^*=\{fI_{\hat{C}_1}+gI_{\hat{C}_2}:f,g\in\mathcal{H}\}. $$ Then $\mathcal{H}^*\subseteq\hat{\mathcal{H}}_2$. We claim that $\mathcal{H}^*$ shatters the $x$-axis, which we call $A$ (as a subset of $\mathbb{R}^2$). To see this, fix $S\subseteq A$, and consider $g_S=(I_S+1)I_{\hat{C}_1}-I_{\mathbb{R}^2}I_{\hat{C}_2}\in\mathcal{H}^*$. Then $g_S$ simplifies to $I_{S\cup U}-I_{L}$ where $U$ is the open upper-half plane and $L$ is the open lower-half plane. So $$ (I_{(0,\infty)}\circ g_S)I_A=I_{S\cup U}I_A=I_S, $$ and we have cut out $S$ from $A$ using $\mathcal{H}^*$.

Note that $n\geq 2$ is necessary in the previous example since in dimension $1$ there are only $k-1$ intersection points in any given sequence (defined using pairwise distinct points), and hence the perturbation to the VC-dimension contributed by these points is finite. On the other hand, if one considers a more degenerate notion in which the $p_i$'s are not assumed to be pairwise distinct, then an example similar to the above can be built to obtain infinite VC-dimension (with a single sequence) even in dimension $1$.

Related Question