The principle of inclusion/exclusion is actually based on the equality:
$$\mathbf{1}_{\bigcup_{i=1}^{n}A_{i}}=\sum_{k=1}^{n}\left(-1\right)^{k-1}\sum_{1\leq i_{1}<\cdots<i_{k}\leq n}\mathbf{1}_{A_{i_{1}}\cap\cdots\cap A_{i_{k}}}\tag1$$
Note that LHS and RHS both give $0$ if an argument $\omega\notin\bigcup_{i=1}^nA_i$ is substituted.
Now suppose that $\omega\in A_{j}$ iff $j\in\left\{ j_{1},\dots,j_{m}\right\} $
where $\left\{ j_{1},\dots,j_{m}\right\} \subseteq\left\{ 1,\dots,n\right\} $
has cardinality $m>0$.
Then substituting $\omega$ as argument gives $1$ on LHS and also on RHS:
$$\sum_{k=1}^{m}\left(-1\right)^{k-1}\binom{m}{k}=1-\sum_{k=0}^{m}\binom{m}{k}\left(-1\right)^{k}1^{m-k}=1-\left(\left(-1\right)+1\right)^{m}=1-0=1$$
This proves $(1)$ and taking expectations on both sides we find consequently:
$$P\left(\bigcup_{i=1}^{n}A_{i}\right)=\sum_{k=1}^{n}\sum_{1\leq i_{1}<\cdots<i_{k}\leq n}\left(-1\right)^{k-1}P(A_{i_{1}}\cap\cdots\cap A_{i_{k}})\tag2$$
So this gives a direct proof without using induction but does not really help you if you insist on the use of induction.
Induction can be applied and your setup is okay, but if that is not necessary then I would certainly choose for this approach.
addendum
First note that for the sake of consistency I replaced every $x$ above by $\omega$.
" I don't know how $x$ can be an argument..."
So this question translates to:
" I don't know how $\omega$ can be an argument..."
As you said in your comment $A_1,A_2,\dots$ are events. That means that - if you work on probability space $(\Omega,\mathcal A,P)$ - they are elements of $\mathcal A$ where $\sigma$-algebra $\mathcal A$ is a subcollection of the power set $\wp(\Omega)$. So actually the $A_i$ are subsets of $\Omega$ in this context.
If $B\subseteq\Omega$ then $\mathbf1_B:\Omega\to\mathbb R$ is the function prescribed by $\omega\mapsto1$ if $\omega\in B$ and $\omega\mapsto0$ otherwise.
So elements of $\Omega$ serve as arguments for functions like $\mathbf1_B$ where $B\subseteq\Omega$.
" I don't get why the binomial coefficient appears..."
Starting with a fixed $\omega\in A_{j_1}\cap\cdots\cap A_{j_m}$ where $1\leq j_1<\cdots<j_m\leq n$ together with $i\in\{1,\dots,n\}-\{j_1,\dots,j_m\}\implies \omega\notin A_i$ for a fixed $k$ take a look at the summation:$$\sum_{1\leq i_{1}<\cdots<i_{k}\leq n}\mathbf{1}_{A_{i_{1}}\cap\cdots\cap A_{i_{k}}}(\omega)$$
Every term equals $1$ or $0$ so the summation equals the number of terms that equal $1$.
Now note that the term $\mathbf{1}_{A_{i_{1}}\cap\cdots\cap A_{i_{k}}}(\omega)$ equals $1$ if and only if $\{i_1,\dots,i_k\}\subseteq\{j_1,\dots,j_m\}$.
So selections of $k$ elements in $\{j_1,\dots,j_m\}$ correspond with terms that equal $1$ and there are exactly $\binom{m}{k}$ of such selections.
" Also we did not introduce the concept of expectation yet.."
I am not going to give a college in order to introduce expectation, but I trust that within a short while you will be made familiar to that.
Actually for this only two things are important and are not too broad to mention:
- If $B\in\mathcal A$ then the expectation of $\mathbf1_B$ exists with $\mathbb E\mathbf1_B=P(B)$
- Linearity of expectation is applied on the RHS.
I hope things are more clear now.
Often the principle of inclusion/exclusion is taught without mentioning the underlying $(1)$. That is really a pity, and this is an effort to save you (and hopefully also others) from that.
Best Answer
You’re quite right that the two-set case is used in the induction step, but it’s still a little tricky to work through.
Suppose that you know it for $n$. Let $B=A_1\cup\ldots\cup A_n$. Then
$$\begin{align*} |A_1\cup\ldots\cup A_{n+1}|&=|B\cup A_{n+1}|\\ &=|B|+|A_{n+1}|-|B\cap A_{n+1}|\\ &=\sum_{k=1}^n(-1)^{k+1}\left(\sum_{1\le i_1<\ldots<i_k\le n}|A_{i_1}\cap\ldots\cap A_{i_k}|\right)+|A_{n+1}|-|B\cap A_{n+1}|\;.\tag{1} \end{align*}$$
Now
$$\begin{align*} |A_{n+1}|-|B\cap A_{n+1}|&=|A_{n+1}|-\left|\left(\bigcup_{k=1}^nA_k\right)\cap A_{n+1}\right|\\ &=|A_{n+1}|-\left|\bigcup_{k=1}^n(A_k\cap A_{n+1})\right|\\ &=|A_{n+1}|-\sum_{k=1}^n(-1)^{k+1}\left(\sum_{1\le i_1<\ldots<i_k\le n}|(A_{i_1}\cap A_{n+1})\cap\ldots\cap(A_{i_k}\cap A_{n+1})|\right)\\ &=|A_{n+1}|+\sum_{k=1}^n(-1)^{k+2}\left(\sum_{1\le i_1<\ldots<i_k\le n}|A_{i_1}\cap\ldots\cap A_{i_k}\cap A_{n+1}|\right)\\ &=\sum_{k=1}^{n+1}(-1)^{k+1}\left(\sum_{1\le i_1<\ldots<i_{k-1}\le n<i_k=n+1}|A_{i_1}\cap\ldots\cap A_{i_k}|\right)\tag{2} \end{align*}$$
by the induction hypothesis applied to the $n$ sets $A_1\cap A_{n+1},\ldots,A_n\cap A_{n+1}$.
To see what’s really going on here, it’s helpful to realize that the sum $(2)$ covers all of the intersections of $A_i$s that include the new $A_{n+1}$, while the first term of $(1)$ covers the ones that don’t include $A_{n+1}$. Combining results, we now have
$$\begin{align*} \left|\,\bigcup_{k=1}^{n+1}A_k\,\right|&=\sum_{k=1}^n(-1)^{k+1}\left(\sum_{1\le i_1<\ldots<i_k\le n}|A_{i_1}\cap\ldots\cap A_{i_k}|\right)\\ &\qquad+\sum_{k=1}^{n+1}(-1)^{k+1}\left(\sum_{1\le i_1<\ldots<i_{k-1}\le n<i_k=n+1}|A_{i_1}\cap\ldots\cap A_{i_k}|\right)\\ &=\sum_{k=1}^{n+1}(-1)^{k+1}\left(\sum_{1\le i_1<\ldots<i_k\le n+1}|A_{i_1}\cap\ldots\cap A_{i_k}|\right)\;, \end{align*}$$
as desired.