[updated with example below]
$\newcommand{\Cov}{\mathrm{Cov}}$
The real need for a notion of a random variable, as opposed to a distribution, comes because one wants have a single mathematical object that contains all of the information necessary to make statements or formulate questions about a given random quantity.
Suppose I want to ask whether real valued random variables $X$ and $Y$ are independent? Without random variables, I cannot answer this just using the distribution of $X$ and the distribution of $Y$. I instead need to appeal to another mathematical object, the joint distribution for $X$ and $Y$ on $\mathbb R\times \mathbb R$.
So (considering real quantities for the moment) every statement about a family of random quantities, say indexed by a set $S$, would need to first specify a joint distribution on the product space $\mathbb R^S$. Further statements involving other random quantities, say indexed by a set $T$ which might or might not intersect with $S$, would need to re-specify a new joint distribution, this time on $\mathbb R^T$, in such a way that was compatible with the distribution on $S$.
It becomes much simpler just to assume, once and for all, an underlying sample space, and then a random quantity has a precise formulation as a random variable, i.e., a measurable function on that sample space.
Example
Suppose we are flipping a fair coin twice, and recording the number of heads flipped as a Bernoulli variable for each as $X_1$, $X_2$. Suppose $X_3=1-X_1$ is defined as the number of tails flipped on the first trial, and likewise $X_4=1-X_2$.
I can define these all in the obvious way as random variables on the sample space of outcomes $\Omega =\{HH,HT,TH,TT\}$, with probability measure $\mu(S)=\frac{\#S}{4}$.
Treating these as random variables on a sample space, I can define independence of $X_i$ and $X_j$ in terms of independence of the events $\{\omega\mid X_i(\omega)\leq x_i\}$ and $\{\omega\mid X_j(\omega)\leq x_j\}$ for all $x_i,x_j\in \mathbb R$, and from this definition, $X_1$ and $X_2$ are independent, while $X_1$ and $X_3$ are not, for example.
However, the distributions of all four $X_i$’s are identical, so there is no way to define independence in terms of their individual distributions. We would need to separately know the joint distribution for every pair in order to answer that question. Or we would need a single joint distribution on $\mathbb R^4$ from which we could derive the pair-wise distributions.
Note that the latter joint distribution on $\mathbb R^4$ would effectively function as an alternative sample space, with the projections onto each coordinate functioning as the given random variables. But it would be quite a bit more cumbersome to describe the joint distribution on four random variables, not all of which are independent. Moreover, suppose we wished to consider other random variables like $Y=\frac{X_1-X_2-X_3}{3}$? How would we easily define something like $\Cov(X_1,Y)$? Do we really want to now derive another joint distribution just for this?
Best Answer
$\{X=x, Y=y\}$ means that $X=x$ and $Y=y$.
From equation $(3)$ to equation $(4)$, the trick is
$$\sum_y P(X=x, Y=y)=P(X=x)$$
That is we consider all the possible values that $Y$ can take and sum it up.
This is due to $$\bigcup_y \{X=x, Y=y\}=\{X=x\}$$
and those sets on the left are disjoint.