The central idea (as is true of much of Mathematics) is to generalize some very familiar concepts to more abstract notions. In particular, there are some things that the notions of length, area, volume, etc. have in common. For one example, if you have nothing, then what you have has no length, area, or volume. For another example, if you have two (or more, up to countably-infinitely-many) distinct objects with clearly-defined length/area/volume, then what you have should also have a clearly-defined length/area/volume, found by simply adding the lengths/areas/volumes of the individual objects.
The purpose of Measure Theory is to capture these common characteristics, and allow them to be applied to versions of "measure" that are unspecified or less intuitive/visualizable. In particular, it lets us (attempt to) answer certain questions, such as: "Can we measure every subset of the real line in a way that corresponds with length for intervals and acts like measurements 'should'?"
It turns out that the answer is: "Well...maybe...if we take certain things for granted." So, what seems like a simple question turns out to be very deep, indeed! In fact, in order for certain principles of measure theory to make sense, one must take certain things for granted. For example, if one doesn't assume that a countable union of disjoint finite sets is countable, then it is completely consistent with Measure Theory that the real number line is of length $0$! If you're curious about the details of the crazy-sounding claims, let me know, and I will explain and/or provide links to clear it up.
The best intuition might come from the applications of measure theory to probability. In probability theory, you take a measure space $(\Omega, \mathcal{A}, P)$ such that $P(\Omega) = 1$. You can think of $\Omega$ as the set of all possible worlds. $P$ is a probability measure that specifies the probability of any measurable subset of possible worlds.
A random variable is then defined as a measurable function $X : \Omega \rightarrow \mathbb{R}$. That is: as an argument, it takes whatever possible world is the case, and tells us one number about the world.
For simplicity, think of it as a coin-flip. So, there's some set of possible worlds $A \in \mathcal{A}$ such that $X(\omega) = 1$ for all $\omega \in A$; this is all the possible worlds where the coin lands heads. Then $A^c$ is the set of all possible worlds where the coin lands tails.
Now, we want to talk about the probability this coin lands heads. However, in our construction of probability, we only really have a probability measure on $\Omega$. How do we state the probability that the coin landed heads? We look at $P X^{-1}(A)$.
This is why you'd want the inverse images to be measurable: you want to define probability distributions of random variables, and you do so based on the probability measure on this underlying probability space $\Omega$.
Hopefully that provides some intuition!
Best Answer
If $X$ a set and $\mathcal X$ is a $\sigma$-algebra on it then for a function $f:X\to\mathbb R$ the following statements are equivalent:
(where $\mathcal B(\mathbb R)$ denotes the Borel $\sigma$-algebra on $\mathbb R$ i.e. the smallest $\sigma$-algebra that contains all open subsets of $\mathbb R$)
Note that here $\{x\in X\mid f(x)>\alpha\}=f^{-1}((\alpha,\infty))$ while $(\alpha,\infty)\in\mathcal B(\mathbb R)$ so actually the condition under the second bullet implies the condition under the third bullet directly.
Conversely the condition under the third bullet is enough to prove that the condition under the second bullet is satisfied. This because it can be proved that:$$\sigma(\{(\alpha,\infty)\mid\alpha\in\mathbb R\})=\mathcal B(\mathbb R)$$ and secondly in general:$$f^{-1}(\sigma(\mathcal V))=\sigma(f^{-1}(\mathcal V))$$for every collection $\mathcal V\subseteq\mathcal P(\mathbb R)$ (so for instance for $\mathcal V:=\{(\alpha,\infty)\mid\alpha\in\mathbb R\}$).
Which one (second or third bullet) to use as definition is a matter of choice.