Meant by “class of indicator functions”

In section 5 VC-Dimension of the Set of Functions of the paper Principles of Risk Minimization for Learning Theory by V. Vapnik, the author says the following:

The theory of uniform convergence of empirical risk to actual risk developed in the 70's and 80's, includes a description of necessary and sufficient conditions as well as bounds for the rate of convergence (Vapnik, 1982). These bounds, which are independent of the distribution function $P(x, y)$, are based on a quantitative measure of the capacity of the set of functions implemented by the learning machine: the VC-dimension of the set.

For simplicity, these bounds will be discussed here only for the case of binary pattern recognition, for which $y \in \{0, 1\}$ and $f(x, w), w \in W$ is the class of indicator functions. The loss function takes only two values $L(y, f(x, w)) = 0$ if $y = f(x, w)$ and $L(y, f(x, w)) = 1$ otherwise. In this case, the risk functional (2) is the probability of error, denoted by $p(w)$). The empirical risk functional (3), denoted by $v(w)$, is the frequency of error in the training set.

It is this part that I am confused about:

For simplicity, these bounds will be discussed here only for the case of binary pattern recognition, for which $y \in \{0, 1\}$ and $f(x, w)$, $w \in W$ is the class of indicator functions.

What exactly is meant by "class of indicator functions", and which one of $y \in \{0, 1\}$ or $f(x, w)$ is this "class of indicator functions"? $f$ is the only function I see here, but I'm not sure how it's a "class". Could this be in reference to "equivalence classes"?

Best Answer

Here $w$ is a parameter that indexes the class of functions. As $w$ ranges over $W$ we obtain the class of functions.

The notation $f(x,w)$ should be read as "the function identified by parameter $w$, evaluated at $x$". A more typical notation would be $f_w$ for the function identified by parameter $w$, and $f_w(x)$ would denote the value at $x$ of the function $f_w$, and the class of functions might be denoted $\mathcal{F}:=\{f_w : w\in W\}$. Perhaps the author wanted to avoid excessive subscripting.

To say "$f(x,w)$, $w\in W$ is the class of indicator functions" means that each function returns either $0$ or $1$ when given any input $x$. This returned value is to be compared with the true value $y\in\{0,1\}$.

Best Answer

Related Solutions

[Math] Connection between Vapnik-Chervonenkis dimension and regularization

An “induction principle”

Related Question