Well, this is a basic fact of the exponential function $e^x$.
One definition of $e$ is the limit $\lim_{n\to\infty}(1+\frac1n)^n$. By a monotonicity argument one can prove $\lim_{x\to\infty}(1+\frac1x)^x=e$ where $x$ now ranges the real numbers.
Also note that $1-\frac1x=\frac{x-1}x=1/\frac x{x-1}=1/(1+\frac1y)=(1+\frac1y)^{-1}$ where $y=x-1$.
So, one has the following:
$$\begin{aligned}
\lim_{x\to\infty}(1-\frac1x )^x &= \lim_{y\to\infty}(1+\frac1y )^{-(y+1)}
\\
&=\lim_{y\to\infty}(1+\frac1y)^{-y}\times\lim_{y\to\infty}(1+\frac1y)^{-1}
\\
&=e^{-1}\times1=e^{-1}\,.
\end{aligned} $$
From here, assuming $\lambda>0$,
$$\begin{aligned}
e^{-\lambda}=(e^{-1})^\lambda &= \lim_{x\to\infty}(1-\frac1x)^{\lambda x} &\to{\ z:=\lambda x}
\\
&= \lim_{z\to\infty}(1-\frac\lambda z)^z\,.
\end{aligned} $$
In consequence, we have this limit for every sequence $z_n\to\infty$ written in place of $z$ and limiting on the natural $n\to\infty$. In particular, this also holds for $z_n=n$.
Note that we had to take the turnaround for arbitrary real numbers instead of integers only because of the exponent $\lambda$.
You are on the right track with the binomial thinking. I'll just expand on the notation and definitions.
I will refer to the positive values $u, d$ as the up and down values in a coin toss for ease of thinking. An element of $Ω$ then looks like a $T$-tuple with either a $u$ or $d$ in each component, which could be interpreted as the outcome of $T$ consecutive coin tosses (Bernoulli trials).
I suppose $ω_j$ is the $j$-th component projection of the vector $ω = (x_1, \dots, x_T)$:
$$ω_j(ω) = ω_j((x_1, \dots, x_T)) = x_j.$$
The function $P$ is defined for singleton sets of outcomes $ω$, each of which is a list of $T$ possible coin toss outcomes $ω_1(ω), \dots, ω_T(ω)$. Each $ω_j(ω)$ here corresponds to an outcome of a single toss.
The definition of $U(ω)$ just says "count the number of ups in the $T$ tosses". Reading the definition of $P$ and comparing it to the binomial distribution makes it apparent that $p$ is supposed to be the probability of getting an up and $(1-p)$ the one of getting a down in each independent toss, making it a $B(T, p)$ distribution.
An event $A$ will be a collection of outcomes $ω$, each comprising $T$ trials. For example, the event that you get up in the first trial contains several $ω$: all possible combinations of $T$ trials that have up as the first outcome. That would be written like this: $\{ω \in \Omega : ω_1(ω) = u\}.$
The way to compute the probability of an event is given in the definition: just sum all probabilities corresponding to each outcome $ω$. In order to prove that $P$ is a probability it will be necessary to prove that $P(\Omega) = 1$.
Afterwards, the expectation of $ω_1$ can be computed by definition:
$$E_P(ω_1) = \sum_{ω \in \Omega} ω_1(ω) P({ω}) = \sum_{x_1, ..., x_T \in \{u, d\}} x_1 P(\{(x_1, \dots, x_T)\}).$$
The best way I can think of to evaluate this $T$-fold sum is to split it depending on the value of $x_1$:
$$\sum_{x_2, ..., x_T \in \{u, d\}} u \cdot P(\{(u, x_2, \dots, x_T)\}) + \sum_{x_2, ..., x_T \in \{u, d\}} d \cdot P(\{(d, x_2, \dots, x_T)\}).$$
Call these two sums $S_u$ and $S_d$, respectively. Now you can use what you know about $x_1$ in each to evaluate them. For example, for $S_u$ you know that you already have one $u$, so $P(\{(u, x_2, \dots, x_T)\}) = p \cdot p^\alpha\cdot (1-p)^\beta$, where $\alpha$ and $\beta$ are the number of ups and downs in $\{x_2, \dots, x_T\}$.
The $p^\alpha\cdot (1-p)^\beta$ bit is actually the probability mass of a $B(T-1, p)$ distribution for outcomes $\{x_2, \dots, x_T\}$, so summing over all possible combinations of $T-1$ coin toss outcomes should give $1$ (good thing they asked us to prove it first for arbitrary $T$). It follows that $S_u = pu$ and, similarly, $S_d = (1-p)d$.
Best Answer
Your first paragraph about random variables, distributions and image measures is correct.
It is important to distinguish between a binomial random variable, which is a random variable $X$, whose distribution $\mathbb{P}_X$ is the binomial distribution, and the binomial distribution itself, which is just a measure.
Suppose, that $X:\Omega \rightarrow \mathbb{R}$ is a random variable with distribution $$\mathbb{P}_X = \sum_{k=0}^n {n \choose k } p^k(1-p)^{n-k}\delta_k,$$ what can we then say about $\mathbb{P}(X=j)$ for $j\in \{0,\dots,n\}$? We compute using the formula \begin{align*} \mathbb{P}(X=j) &= \mathbb{P}_X(\{j\}) \\ &= \sum_{i=0}^n {n \choose k } p^k(1-p)^{n-k}\delta_k(\{j\}) \\ &= {n \choose j } p^j(1-p)^{n-j} \end{align*} where we have used that $$\delta_k(\{j\}) = \begin{cases} 1 & k=j \\ 0 & k\neq j\end{cases}.$$ And we see that two definitions do agree with eachother. The advantage of the second definition is, that $\sum_{k=0}^n {n \choose k } p^k(1-p)^{n-k}\delta_k(A)$ is well defined for all measurable sets $A \subseteq \mathbb{R}$ and not just singletons. Do note that some random variables (normally distributed r.v. for instance) do have $\mathbb{P}(X=j) = 0$ for all $j$.