Here is a simple proof that is tediously long. As with many proofs dealing
with measure, it proceeds by showing an equivalent result for indicator
functions, then simple functions and then measurable function.
I think the essence of the result is 1), the rest is generalisation.
I am assuming that you are using the Lebesgue measure $m$.
Since we can extend $f$ by letting $f(x)=0$ for $x \notin E$, there is no loss of generality in taking $E= \mathbb{R}$.
1) If $A$ has finite measure and $\epsilon>0$, we can find a finite collection of disjoint open intervals $I_1,...,I_n$ such that $m (A \triangle \cup_{k \le n} I_k) \le \epsilon$.
Suppose $A$ is measurable set of finite measure. Then for any $\epsilon>0$ there is a (possibly countable) collection of open intervals $\{I_k\}$ such that $A \subset \cup_k I_k$ and $mA \le \sum_k m I_k \le mA+ {\epsilon \over 2}$. Since $m (\cup_k I_k) = m A + m (\cup_k I_k \setminus A)$, we have
$m ( \cup_{k \le n} I_k \setminus A) \le m ( \cup_{k} I_k \setminus A) \le { \epsilon \over 2}$, for any $n$.
We also have $mA = m (A \cap \cup_{k \le n} I_k ) + m ( A \setminus \cup_{k \le n} I_k)$, hence by continuity of measure, we can find some $n$ such that
$m ( A \setminus \cup_{k \le n} I_k) \le { \epsilon \over 2}$. Hence for some $N$ we have $m (A \triangle \cup_{k \le n} I_k) \le \epsilon$.
To obtain disjoint intervals, suppose two of the open intervals $I_1,...,I_n$ overlap, say $I_{k_1}$ and $I_{k_2}$. Then $I_{k_1} \cup I_{k_2}$ is also an open interval that contains both intervals and
$m (I_{k_1} \cup I_{k_2}) \le m I_{k_1} + m I_{k_2}$. So, we remove
$I_{k_1}$, $I_{k_2}$ from the collection and replace them by the combined
interval $I_{k_1} \cup I_{k_2}$. We continue until there are no overlapping intervals. Let $\{ I_k' \}_{k=1}^{n'}$ be the resulting collection of disjoint intervals. Since $\cup_{k=1}^n I_k = \cup_{k=1}^{n'} I_k'$, all of the above estimates are still valid and we have
$m (A \triangle \cup_{k \le n'} I_k') \le \epsilon$.
2) If $B = I_1 \cup \cdots \cup I_n$ where the $I_k$ are disjoint intervals, then $1_{B} = \sum_{k=1}^n 1_{I_k}$ is a step function.
3) Let $A$ have finite measure and $\epsilon>0$, then there exists a step function $s$ and a set $\Delta$ such that $1_A(x) = s(x)$ for $x \notin \Delta$ and $m \Delta \le \epsilon$. Furthermore, if $A \subset J$, where $J$ is an bounded interval, we can take $\Delta \subset J$ as well (in particular, we have
$s(x) = 0$ for $x \notin J$).
From 1), we have disjoint open intervals $I_1,...,I_n$ such that
$m (A \triangle \cup_{k \le n} I_k) \le \epsilon$. Let $s=\sum_{k=1}^n 1_{I_k}$ and $\Delta = A \triangle \cup_{k \le n} I_k$.
Then we see that $m \Delta \le \epsilon$ and
if $x \notin \Delta$, we have $s(x) = 1_A(x)$.
If $A \subset J$, a bounded interval, let $s'=s \cdot 1_J$, and note that $s'$
is a step function. Notice that if $x \notin J$, then $s'(x) = 0 = 1_A(x)$ and so we have $s'(x) = 1_A(x)$ for $x \notin \Delta' = \Delta \cap J$, with $m \Delta' \le m \Delta \le \epsilon$.
4) If $A$ is measurable and $\epsilon>0$, then there exists a step function $s$ and a set $\Delta$ such that $1_A(x) = s(x)$ for $x \notin \Delta$ and $m \Delta \le \epsilon$.
(Note that the finite sum of step functions is a step function.)
Let $B_n = A \cap [n,n+1)$ and $\epsilon>0$. Let $s_n$ and $\Delta_n \subset [n,n+1)$ be the step function and set such that $1_{B_n}(x) = s_n(x)$ for all $x \notin \Delta_n$ with $m \Delta_n \le {1 \over 2^{|n|+2}} \epsilon$. (Note that we can take $J=[n,n+1)$, so that $\Delta_n \subset [n,n+1)$. Consequently, $s_n(x) = 0$ when $x \notin [n,n+1)$.)
Let $s = \sum_n s_n$ and $\Delta = \cup_n \Delta_n$. Then if $x \notin \Delta$, we have $s(x) = \sum_n s_n(x) = \sum_n 1_{B_n}(x) = 1_{B_{\lfloor x \rfloor}} (x) = 1_A(x)$ and $m \Delta \le { 3 \over 4} \epsilon$.
5) If $\sigma$ is a real valued simple function and $\epsilon>0$, there exists a step function $s$ and a set $\Delta$ such that $m \Delta \le \epsilon$ and
$\sigma(x) = s(x) $ for $x \notin \Delta$.
Suppose $\sigma = \sum_{k=1}^n \alpha_k 1_{A_k}$, where the $A_k$ are measurable. Using 4), choose $s_k, \Delta_k$ such that $1_{A_k}(x) = s_k(x)$ for $x \notin \Delta_k$, and $m \Delta_k \le {1 \over n} \epsilon$. If we let
$s = \sum_k \alpha_k s_k$ (note that $s$ is a step function)
and $\Delta = \cup_k \Delta_k$, we have $m \Delta \le \epsilon$ and if
$x \notin \Delta$, then $\sigma(x) = s(x)$.
And finally:
6) If $f$ is measurable, there are step functions $s_n$ such that $s_n(x) \to f(x)$ for ae. $x$.
There are real valued simple functions $\sigma_n$ such that $\sigma_n(x) \to f(x) $ for all $x$. Using 5), choose step functions $s_n$ and sets $\Delta_n$
such that $s_n(x) = \sigma_n(x)$ for $x \notin \Delta_n$, and
$m \Delta_n \le {1 \over n} {1 \over 2^{n+1}}$.
Note that if $s_n(x)$ does not converge to $f(x)$, then we must have $s_n(x) \neq \sigma_n(x)$ for infinitely many $n$. This means $x \in \Delta_n$ infinitely often. Equivalently, we have
$x \in Q=\cap_n \cup_{k \ge n} \Delta_k$.
Since $m Q \le m (\cup_{k \ge n} \Delta_k)
\le \sum_{k \ge n} {1 \over k 2^{k+1}} \le {1 \over n}$ for all $n$, we see that $m Q = 0$.
Consequently, for any $x \notin Q$, there is some $N$ such that
$s_n(x) = \sigma_n(x)$ for all $n \ge N$, and so
$s_n(x) \to f(x)$.
Best Answer
I just want to drill into your second question: why is this theorem significant. Postmortes already gave a summary of that in this answer, but I want to give a bit more detail. Without understanding the proof of this theorem, even looking at what it's saying, one can be amazed. To see why, let's start with a classic general rule involving quantifiers. Suppose we have some predicate $P(x, y)$ with $x \in X$ and $y \in Y$. Then the following general rules holds:
$$\exists x \in X \;\forall y \in Y \;P(x, y) \implies \forall y \in Y \; \exists x \in X \;P(x, y)$$
In other words, if there is some fixed $x$, over which the predicate holds for all $y$, then it is certainly true that at each $y$, there is some $x$ such that the predicate holds: just choose the same fixed $x$ at each $y$. However, and this is the key point: the converse is not true in general: just because for each $y$ we can find some $x$, dependent on that $y$, that satisfies the predicate, it doesn't mean we can find a single $x$ satisfying the predicate across all $y$.
At this point, let's step into a concrete counterexample which will serve to show why the converse of the above is not true and will also get us closer to the significance of Egorov's result. Let's forget about measures for the moment and just consider uniform vs. pointwise convergence. Suppose we have a set $Z$ and a sequence of functions $f_n$ on that set. Pointwise convergence of $f_n \to f$ on $Z$ is defined as:
$$\forall \epsilon \in \Bbb R \; \forall z \in Z \; \exists m \in \Bbb N \; \forall i > m \; |f_i(z) - f(z)| < \epsilon$$
Whereas uniform convergence is defined as:
$$\forall \epsilon \in \Bbb R \; \exists m \in \Bbb N \; \forall z \in Z \; \forall i > m \; |f_i(z) - f(z)| < \epsilon$$
Now, while there surely are more quantifiers at play, if we look closely enough we can see that the two definitions differ only in terms of the ordering of a consecutive existential and universal quantifier. This immediately lets us conclude, using the general rule above, that uniform convergence implies pointwise convergence. Now let's use this concrete example to see why pointwise convergence doesn't imply uniform convergence. Let $Z = \Bbb R$ and let $f_n(z) = z/n$.
Clearly for any given $z$, $f_n(z) \rightarrow 0$ as $n \rightarrow \infty$. That means, for any $\epsilon$ and $z$ we can pick an $m$ such that for all functions $f_i$ with $i > m$, the difference with $f$ is bounded i.e.
$$|f_i(z) - f(z)| < \epsilon$$
However, given that same $\epsilon$, there is no fixed $m$ that will give us such a bound for all $z$. You can show that by contradiction, assuming there is some such $m$. But since $f_i(z) = z/i$, we can always choose some $z$ to make $f_i(z)$ as large as we want, for any given $i$. So for any $i > m$, we can choose such a $z$, contradicting the assumption that there is some $m$ giving uniform convergence.
So now moving on to Egorov's result. Why is it significant? Well, to begin with, it's not immediately intuitive, because it might appear to go against the above counterexample, which shows that we can't in general pull an existential quantifier outside of a universal quantifier. Any theorem that seems to defy pre existing intuition is in itself useful, because it strengthens our understanding further and allows us to reset our intuition to better understand the underlying theory: in this case measure theory.
As Postmortes argues, it's also practically very useful. At a very cheap cost, it manages to strengthen pointwise convergence to uniform convergence. The price we have to pay is minimal: just remove a tiny little set that we can make as small as we want and boom! We get uniform convergence on the rest of the set. In other words, in this special case, we are allowed to pull the existential quantifier outside of the universal quantifier- once we remove a tiny bit of the domain.
Why else this is significant is that it provides great motivation for using measure theory. Without measure theory, we wouldn't be able to conclude the result of Egorov. In particular, the key ingredient that measure theory brings into the mix is continuity of the measure- this continuity, combined with the fact that the sequence of functions is countable, allows us to shrink the set we are removing to be as small as we like, and leave a remaining set where uniform convergence holds.