I think $(3)$ is misleading. Since $E_k$ is the set of all sample points in exactly $k$ of the $A_i$ sets, $(3)$ makes no sense at all to me.
For $(3)$ I suggest we let each $E_k$ be a disjoint union:
$$E_k = \bigcup_{i=1}^{\binom{n}{k}} D_{k,i}$$
where each $D_{k,i}$ is the subset of $E_k$ where the $k$ $A_i$'s are a distinct $k$ of the sets $A_1,\ldots,A_n$.
Then, for $(3),\;$ we assume WLOG, for any given $k,i,\;$ that $D_{k,i}$ is contained in $A_1,\ldots,A_k$ (and in no others) and proceed as I think the author intends, with $D_{k,i}$ instead of $E_k$. Note that, by the additivity of probability:
$$P(E_k) = \sum_{i=1}^{\binom{n}{k}} P(D_{k,i}).$$
The idea of $(5)$ is that we have
\begin{eqnarray*}
P\left(\bigcup_{i=1}^{n}A_i\right) &=& \sum_{k=1}^{n} P(E_k) \qquad\qquad\text{by $(1)$} \\
&=& \sum_{k=1}^{n} \sum_{i=1}^{\binom{n}{k}} P(D_{k,i}) \qquad\text{by $(3)$}\qquad\qquad\qquad\text{(*)} \\
\end{eqnarray*}
By $(3),\;$ any $P(D_{k,i})$ appears in the expression $P_1-P_2+P_3-\cdots\pm P_k,\;$ exactly $\left(k-\binom{k}{2}+\binom{k}{3}-\cdots\pm\binom{k}{k}\right)$ times, and this equals $1$ by $(4)$.
So, from $(*),\;$ we can conclude that $P(\cup_{i=1}^{n}A_i) = P_1-P_2+P_3-\cdots\pm P_n,\;$ as required.
That seems quite opaque: It's a way of computing a quantity rather than telling what exactly it is or even motivating it. It also leaves completely open the question of why such a function exists and is well-defined. The properties you give are sufficient if you're trying to put a matrix in upper-triangular form, but what about other computations? It also gives no justification for one of the most important properties of the determinant, that $\det(ab) = \det a \det b$.
I think the best way to define the determinant is to introduce the wedge product $\Lambda^* V$ of a finite-dimensional space $V$. Given that, any map $f:V \to V$ induces a map $\bar{f}:\Lambda^n V \to \Lambda^n V$, where $n = \dim V$. But $\Lambda^n V$ is a $1$-dimensional space, so $\bar{f}$ is just multiplication by a scalar (independent of a choice of basis); that scalar is by definition exactly $\det f$. Then, for example, we get the condition that $\det f\not = 0$ iff $f$ is an isomorphism for free: For a basis $v_1, \dots, v_n$ of $V$, we have $\det f\not = 0$ iff $f(v_1\wedge \cdots \wedge v_n) = f(v_1) \wedge \cdots \wedge f(v_n) \not = 0$; that is, iff the $f(v_i)$ are linearly independent. Furthermore, since $h = fg$ has $\bar{h} = \bar{f}\bar{g}$, we have $\det(fg) = \det f \det g$. The other properties follow similarly. It requires a bit more sophistication than is usually assumed in a linear algebra class, but it's the first construction of $\det$ I've seen that's motivated and transparently explains what's otherwise a list of arbitrary properties.
Best Answer
Let $A=(a_{ij})=(\langle x_{i}, x_{j} \rangle)$ be the Gram matrix of the $x_{i}$'s. Since $A$ is symmetric and has entries in $\{0,1\}$, you can think of $A$ as the adjacency matrix of a graph on vertex set $V=\{1,2,\dots,n\}$ and edge set $E=\{\{i,j\}:x_{i}=x_{j}\}$ (i.e., vertices $i$ and $j$ are adjacent iff $x_{i}=x_{j}$). Call this graph $G_{A}.$ It's worth noting that because the diagonal of $A$ is all $1$'s, each vertex has a self-loop, but we can disregard these since the Hafnian never looks at the diagonal entries of $A.$
Now, suppose $k$ is odd and that $x_{i_{1}},x_{i_{2}},\dots,x_{i_{k}}$ are all copies of the standard basis vector $e_{i}.$ The Hafnian of $A$ counts perfect matchings of $G_{A}$, where self-loops are not allowed in a matching. The vertices $i_{1},i_{2},\dots,i_{k}$ are a clique of size $k$ in the graph, and none of the $i_{j}$'s have neighbors outside of this clique. Since $k$ is odd, there is no way to match the vertices in this clique, and thus $G_{A}$ has no perfect matchings.
You can also see this from the definition of the Hafnian, $\text{haf}(A)=\frac{1}{(n/2)!2^{(n/2)}}\sum_{\sigma \in S_{n}}\prod_{i=1}^{n-1}A_{\sigma(i)\sigma(i+1)}$, with $S_{n}$ the set of permutations of $\{1,2,\dots,n\}.$ For any $\sigma \in S_{n}$, the product $\prod_{i=1}^{n-1}A_{\sigma(i)\sigma(i+1)}$ must be $0$ since $\sigma$ must map an $i_{j}$ to some element not in $\{i_{1},i_{2},\dots,i_{k}\}.$