Inclusion-Exclusion Principle – Combinatorics Explanation and Exact-$m$ Properties Formula

combinatorial-proofscombinatoricsinclusion-exclusionsolution-verification

I'm trying to "explain" (I think this would not be a formal proof because I use a special case of the formula itself when I was "proving" it. So the tag I put might need to be edited.) this formula: ($E_m$ counts those elements with exactly $m$ properties. $S_j$ is explained in the quoted text below)

$$
E_m=\sum_{j=m}^n(-1)^{j-m}\color{blue}{{j\choose m}}S_j
$$

The following is the final version I just finished editing on one of my old answers. Let me cite the text below. My main focus is on the correctness of steps 3. and 4. (but any error identified is welcome!) because it's still very counterintuitive for me. Link to the original answer: https://math.stackexchange.com/a/3804796/390226

If my argument would work, I would also like to know how to turn this "proof" into a formal one. Thanks in advance!

Terms used in the answer:

  1. $A_k$: The set of elements has (at least) property indexed $k$.
  2. IEP for Inclusion-Exclusion Principle.
  3. Exactly-none IEP: exactly none of the indexed properties got selected via IEP. i.e.

Long answer to understand the two coefficients:

  1. Let's define $S_m$: It's a shorthand to simplify the formula of IEP. Assume there are $n$ properties in total, and $I$ is an index set, then we can pick any $m\le n$:

$$\begin{align}
S_m&=\sum\left|\textrm{an intersection of }m\textrm{ sets}\right|\\
&=\sum_{|I|=m}\left|\bigcap_{j\in I}A_j\right|
\end{align}$$

  1. $E_0$ gives exactly-none IEP: (Let $S_0=U$):

$$\begin{align}E_0 &=\sum_{j=0}^n(-1)^{j-0}{j\choose 0}S_j\\
&=\sum_{j=0}^{n}(-1)^{j}S_j\\
&=\left|\bigcap_{j=1}^n\bar{A_j}\right|.\\
\end{align}$$

for each $S_j$, the coefficient is $(-1)^j$:

$$\begin{align}
E_0&=\sum_{j=m}^n(-1)^{j-m}{j\choose m}S_j, m=0\\
&=\sum_{j=0}^n(-1)^{j-0}{j\choose 0}S_j\\
&=\sum_{j=0}^{n}(-1)^{j}S_j.
\end{align}
$$

The ${j\choose m}$ disappears in $E_0$. But for $E_m$, we got more than that. So maybe there are more than one exactly-none IEPs in $E_m$?

  1. My attempt to explain $E_m$:

$$
E_0=\sum_{j=m}^n(-1)^{j-m}{j\choose m}S_j\\
$$

3.1. Observe the first term, $j=m$. Now $S_m$ gives:

$$\begin{align}
S_m&=\sum_{|I|=m}\left|\bigcap_{j\in I}A_j\right|\\
&=\left|A_{1,2,…,m}\right| + \left|A_{1,2,…,(m-1),(m+1)}\right| + … + \left|A_{(n-m+1),(n-m+2),…,n}\right|\\
\end{align}\\
$$

3.2. Now we calculate $E_0$ once for each term $A_{…}$ as the universe. So we have: (The notation $E_{0,U}$ where $U$ means the universe defined for the calculation)

$$\begin{align}
E_{0,A_{1,2,…,m}}&=\color{blue}{\left|A_{1,2,…,m}\right|} + \sum_{j=m+1}^n(-1)^{j-m}S_j\\
E_{0,A_{1,2,…,(m-1),(m+1)}}&=\color{blue}{\left|A_{1,2,…,(m-1),(m+1)}\right|} + \sum_{j=m+1}^n(-1)^{j-m}S_j\\
\vdots\\
E_{0,A_{(n-m+1),(n-m+2),…,n}}&=\color{blue}{\left|A_{(n-m+1),(n-m+2),…,n}\right|} + \sum_{j=m+1}^n(-1)^{j-m}S_j\\
\end{align}
$$

Notice that these $\left|A_{…}\right|$ have been provided by $E_0$. And the sum is $\color{blue}{S_m}$, i.e. the (common) first term of $E_m$.

you might have many questions at this stage:

Q1:

Those $A_{…}$ might overlap some of the others?

Yes. But if each $E_{0, A_{\dots}}$ work, no overlapping. Because $E_m$ means exactly-m properties when the calculation is complete.

Q2:

Each $E_0$, ignoring those $(-1)^{j-m}$, have $1$ for each term. Now you're trying to convince me that applying it many times will create ${j\choose m}$, a variable for each term in the resulting $E_m$? By intuition, if you apply it, say $5$ times, you should have a $5$ for each term?

Yes. I don't know how to explain this either for now.

  1. Now, the strangest step, I cannot believe it too:

Let's try to calculate how many $S_j$ are needed when $j$ is given. This is equivalent to finding how many universes include each of them. So here is the term:

$$
{j\choose m}
$$

given any $j$ properties, you choose $m$ of them, you find one universe, i.e. one left-hand-side on the list of 3.2. that includes it on the right-hand-side. This explains the mysterious ${j\choose m}$ of $E_m$.

Best Answer

Here we follow closely the proof of Exercise 3, chapter 2: Sieve methods from Enumerative Combinatrics, Vol. I by R. P. Stanley. To ease comparison I'll try to use OP's notation where appropriate.

Setting: We consider a set $A=\{P_1,\ldots,P_n\}$ of $n$ properties. Given a finite set $X$ we denote

  • with $E_{m}$ the number of elements in $X$ where each element has exactly $m$ properties from $A$.

  • with $E_{\ge m}$ the number of elements in $X$ where each element has at least $m$ properties from $A$.

The claim: The following is valid \begin{align*} \color{blue}{E_m=\sum_{j=m}^n(-1)^{j-m}\binom{j}{m}S_j}\tag{1} \end{align*} given \begin{align*} \color{blue}{S_j=\sum_{{T\subseteq A}\atop{|T|=j}}E_{\geq}(T)}\tag{2} \end{align*} where $E_{\geq}(T)$ means the number of elements in $X$ where each element has at least the properties from $T\subseteq A$.

Proof: We start with the right-hand side from (1) and put (2) into it. We so obtain \begin{align*} \color{blue}{\sum_{j=m}^n}&\color{blue}{(-1)^{j-m}\binom{j}{m}S_j}\\ &=\sum_{j=m}^n(-1)^{j-m}\binom{j}{m}\sum_{{T\subseteq A}\atop{|T|=j}}E_{\geq}(T)\tag{3.1}\\ &=\sum_{j=m}^n(-1)^{j-m}\binom{j}{m} \sum_{{T\subseteq R\subseteq A}\atop{|T|=j}}E_{=}(R)\tag{3.2}\\ &=\sum_{R\subseteq A}E_{=}(R)\sum_{T\subseteq R}(-1)^{|T|-m}\binom{|T|}{m}\tag{3.3}\\ \end{align*}

Comment:

  • In (3.1) we use (2).

  • In (3.2) we introduce a new set $R$ of properties in between $T$ and $A$ which enables us to use $E_{=}(R)$ instead of $E_{\geq}(T)$.

  • In (3.3) we rearrange the sums using $|T|=j$.

Now we look at the inner sum of (3.3) which can be considerably simplified. We obtain setting $|R|=r$ \begin{align*} \color{blue}{\sum_{T\subseteq R}(-1)^{|T|-m}\binom{|T|}{m}}& =\sum_{j=0}^r\sum_{{T\subseteq R}\atop{|T|=j}}(-1)^{|T|-m}\binom{|T|}{m}\tag{3.4}\\ &=\sum_{j=0}^r(-1)^{j-m}\binom{j}{m}\sum_{{T\subseteq R}\atop{|T|=j}}1\tag{3.5}\\ &=\sum_{j=0}^r(-1)^{j-m}\binom{j}{m}\binom{r}{j}\tag{3.6}\\ &=\binom{r}{m}\sum_{j=0}^r(-1)^{j-m}\binom{r-m}{r-j}\tag{3.7}\\ &\,\,\color{blue}{=\delta_{r,m}}\tag{3.8} \end{align*}

Comment:

  • In (3.4) we rearrange the sum according to the size $j$ of $T$, $0\leq j\leq r$.

  • In (3.5) we factor out the sign and the binomial coefficient leaving a summand $1$.

  • In (3.6) we note that $\binom{r}{j}$ is the number of subsets with $j$ elements of a set with size $r$.

  • In (3.7) we use the binomial identity $\binom{j}{m}\binom{r}{j}=\binom{r}{m}\binom{r-m}{r-j}$.

  • In (3.8) we use $\sum_{j=0}^{r}(-1)^j\binom{r-m}{r-j}=(1-1)^{r-m}=\delta_{r,m}$.

Putting finally (3.8) into (3.3) we obtain \begin{align*} \color{blue}{\sum_{j=m}^n(-1)^{j-m}\binom{j}{m}S_j} &=\sum_{R\subseteq A}E_{=}(R)\delta_{|R|,m}\\ &=\sum_{{R\subseteq A}\atop{|R|=m}}E_{=}(R)\\ &\,\,\color{blue}{=E_{m}} \end{align*} and the claim follows.



Addendum: The proof revisited

Here we look at the two key ingredients which R. P. Stanley used in the proof.

  • We start with a comparison of the claim (1) and the binomial expression in (3.6) \begin{align*} E_m=&\sum_{j=m}^n\color{blue}{(-1)^{j-m}\binom{j}{m}}S_j\tag{1}\\ &\sum_{j=m}^n\color{blue}{(-1)^{j-m}\binom{j}{m}}\binom{r}{j}=\delta_{r,m}\tag{3.6, 3.8}\\ \end{align*}

    It is the binomial identity (3.6,3.8) which is the reason for choosing the coefficient $(-1)^{j-m}\binom{j}{m}$ in (1). Since this cancels all unwanted terms leaving finally those of $E_m$.

  • The second ingredient is the rearrangement of $S_j$ from a representation with $E_{\geq}(\cdot)$ summands into a representation with $E_{=}(\cdot)$ summands. We have in steps (3.3) to (3.8) the following rearrangement of $S_j$: \begin{align*} S_j=\sum_{{T\subseteq A}\atop{\color{blue}{|T|=j}}}\color{blue}{E_{\geq}(T)} &=\sum_{{T\subseteq R\subseteq A}\atop{|T|=j}}E_{=}(R)\\ &=\sum_{R\subseteq A}E_{=}(R)\sum_{{T\subseteq R}\atop{|T|=j}}1\\ &\,\,=\sum_{R\subseteq A}\color{blue}{E_{=}(R)\binom{|R|}{j}}\tag{3.9} \end{align*}

It is this representation (3.9) which admits the usage of the binomial identity from above, since putting (3.9) into (1) we obtain \begin{align*} \color{blue}{\sum_{j=m}^n(-1)^{j-m}\binom{j}{m}S_j}& =\sum_{R\subseteq A}E_{=}(R)\sum_{j=m}^n(-1)^{j-m}\binom{j}{m}\binom{|R|}{j}\\ &=\sum_{R\subseteq A}E_{=}(R)\,\delta_{|R|,m}\\ &=\sum_{{R\subseteq A}\atop{|R|=m}}E_{=}(R)\\ &\,\,\color{blue}{=E_m} \end{align*} which completes the proof.