I am reading "Understanding Machine Learning" book by Shalev-Shwartz and Ben-David.
On page 48, theorem 6.7 (The fundamental theorem of statistical learning, FTSL) says: Let $H$ be a hypothesis class… the following properties are equivalent… 4. $H$ is PAC learnable. 5. Any ERM rule is a successful PAC learner for $H$.
I could not find the definition of ERM rule in this book or anywhere. What is it?
It can not be any algorithm which minimizes empiric risk, can it?
What are the conditions on the algorithm to find the PAC solution?
Best Answer
In section 6.4 of Understanding Machine Learning by Shai Shalev-Shwartz and Shai Ben-David, it is written
When the authors write "the following are equivalent", what they mean is $$ \text{statement 1} \implies \text{statement 2} \implies \text{statement 3} \implies \text{statement 4} \implies \text{statement 5} \implies \text{statement 6} \implies \text{statement 1} $$
So, if statement 1 is true, then statement 2 is true, and if statement 2 is true, then statement 3 is true, and so on, until we form a loop back to statement 1. After listing these statements, the authors then write
In section 6.5, it is written
Response to first comment
Unfortunately, it seems that the wording of Theorem 6.7 is a bit vague. In section 8.2, the authors write
Therefore, every ERM rule is dependent on four choices:
In theorem 6.7, it is written
which means that the choices of the hypothesis class $\mathcal{H}$, the domain set $Z$, and the loss function $\ell$ have already been made. This leaves the choice of the finite input sample (training dataset) $S \in Z^m$. This suggests that when the authors write in Theorem 6.7
What they mean is
Response to second comment
Yes, assuming that you have made your choice of the finite input sample (training dataset) $S \in Z^m$, this is just $$ \text{statement 4} \implies \text{statement 5} $$ in theorem 6.7.
Response to third comment
I think section 4.2 complements section 4.1, so see section 4.1 as well. Note that in my original answer, I wrote
which means that I wasn't sure if section 4.2 is what they were referring to. Regardless, I've removed this statement from my original answer.
Response to fourth comment
I am not sure what you mean by "infinite classes" here. If you mean that $\mathcal{H}$ is infinite in size, then we may not be interested in infinitely sized hypothesis classes anyway. In section 2.3, the authors write
and in section 2.3.1, the authors write