There is no need to show the opposite inclusion. In fact the opposite inclusion is not true ($\mathcal{B} \supset \Sigma$ means that every set whose preimage is measurable is borel, but this is clearly nonsense, take as an example $f=id_{\mathbb{R}}$).
What you proved is that the following are equivalent:
- $\forall U \subseteq Y$ open $f^{-1}(U)$ is measurable
- $\forall U \subseteq Y$ borel $f^{-1}(U)$ is measurable
Clearly 2 implies 1 since every open is borel. Conversely, if 1. holds, then $\mathcal{B} \subset \Sigma$ (since $\Sigma$ is a $\sigma$-algebra), so 2 holds.
The best intuition might come from the applications of measure theory to probability. In probability theory, you take a measure space $(\Omega, \mathcal{A}, P)$ such that $P(\Omega) = 1$. You can think of $\Omega$ as the set of all possible worlds. $P$ is a probability measure that specifies the probability of any measurable subset of possible worlds.
A random variable is then defined as a measurable function $X : \Omega \rightarrow \mathbb{R}$. That is: as an argument, it takes whatever possible world is the case, and tells us one number about the world.
For simplicity, think of it as a coin-flip. So, there's some set of possible worlds $A \in \mathcal{A}$ such that $X(\omega) = 1$ for all $\omega \in A$; this is all the possible worlds where the coin lands heads. Then $A^c$ is the set of all possible worlds where the coin lands tails.
Now, we want to talk about the probability this coin lands heads. However, in our construction of probability, we only really have a probability measure on $\Omega$. How do we state the probability that the coin landed heads? We look at $P X^{-1}(A)$.
This is why you'd want the inverse images to be measurable: you want to define probability distributions of random variables, and you do so based on the probability measure on this underlying probability space $\Omega$.
Hopefully that provides some intuition!
Best Answer
This definition isn't any more or less general, it's just the way to define a measurable function from a measurable space into a topological space. Until you've given it a topology, a measurable space is just that, and vice versa - until you define a sigma algebra, a topological space is not a measurable space. Rudin uses this definition because he needs topological structure (and not measurable structure) on his target space.
Also, it's always good practice to go through each theorem and figure out which structures were necessary in the proof.