It sounds like you've solved the problem; if not, I'll help out. For the question concerning Riemann vs. Lebesgue integration, I'll see if I can give you some motivation. Let's just pretend we're working with a smooth nonnegative function $f$ over $\mathbb{R}.$
In Riemann integration, we start by partitioning the $x$-axis, and then we capture the area under the curve by measuring how much of the $y$-axis can fit under the curve above a given element of our partition.
Conversely, with Lebesgue integration, we start to approximate $f$ by approximating the range of $f$ with the $\alpha_i$ used in the definition. In some sense, we're partitioning the $y$-axis into chunks that describe the $y$-behavior of $f$. Then, once we've approximately partitioned the range of $f$, we get a similar notion of area by measuring the $A_i$, which describe the sets of all $x\in \mathbb{R}$ for which the corresponding $\alpha_i$ is a 'good' approximation of $f$. You can see how, in a rough sense, Riemann integration gets area by chopping up $x$ and measuring $y$, and Lebesgue integration gets area by chopping up $y$ and measuring $x$. Only, in the latter case, we have better tools for describing measure.
The standard example is something like $f = \chi_\mathbb{Q}\cap [0,1]$. Obviously, the Riemann upper and lower sums are 1 and 0 respectively, so $f$ is not Riemann integrable, and partitioning the $x$-axis seems unfruitful. On the other hand, if we're using Lebesgue measure, we let $\alpha_0 = 0$ and $\alpha_1 = 1$ and apply your result to get $\int f = 0$. So this $y$-chopping and $x$-measuring lets us handle a wider variety of functions (in general).
The Wikipedia page on Lebesgue integration has a section for motivation/intuition, in case that's helpful.
They are in fact equal and you can set up the theory of the Lebesgue integral starting from either one and arrive at the other. We just had this last semester as part of calc III and the proof is not trivial (depending on how far you are into the topic), but very easy to follow once you have the basic tools to prove it assembled. We did it with measure theoretic induction, monotone convergence etc. In fact, we started from the definition you quoted from the book.
I have not read that particular piece, but I would not be surprised to see the author prove that equality throughout his book.
This explanation maybe does not directly give intuition for how to prove that both expressions are equal. But maybe it is convincing enough to suggest that both arrive at the same value, which is "computing the integral for functions and sets not as well-behaved as in the Riemann case". The integral of simple functions basically means you go through all values of that function's domain and sum up the corresponding area, i.e. per $\alpha_{i} $ you compute the area as $\alpha_{i}\cdot \mu(s=\alpha_{i})$ and take the sum to arrive at $\int s$. Then $\sup_{0\leq s\leq f} \int s$ basically equates to going through all approximations of f through simple functions s.t. $s \leq f$ (similar to the Riemann upper sum), computing their integrals, and taking the closest match.
Edit:
If you restrict this to the Lebesgue integral this follows quickly from Fubini's theorem and another result which states $\int_{\mathbb{R}^{n}}f d\mathcal{L}^{n}=\mathcal{L}^{n+1}(\{(x,y)\in \mathbb{R} \mid 0\leq y < f(x)\})$.
For general measure spaces the proof goes like this. If $\exists k\in\mathbb{N}\setminus{\{0\}}:\mu(f^{-1}((\frac{1}{k},\infty])=\infty \implies$ both sides are infinite.
So we can assume $\forall k \in\mathbb{N}\setminus{\{0\}}:\mu(f^{-1}((\frac{1}{k},\infty])<\infty$.
Thus, $f^{-1}((0,\infty])$ is $\sigma$-finite (will be relevant in Step 2). For functions where this is the case, the integral (defined as the sup of simple $s$ s.t. $s\leq f$) can be written as follows:
$\int_{E} f d \mu = sup\big\{\int_{F} \varphi d \mu \mid F \subseteq E, F \in
\mathcal{S},\mu(F)<\infty, \varphi : F\rightarrow [0,\infty) measurable, \#\varphi(F)<\infty, 0\leq\varphi\leq f\big\}$.
The proof for the above is lengthier, so I am skipping that, but it's a useful result for proving your equality.
This proof follows measure theory induction (starting with simple functions, then working your way up to arbitrary measurable functions).
Step 1: let f be linear combination of indicator functions. Then $f=\sum_{j=1}^{J}c_{j}\chi_{F_j}$ s.t. $F_j \cap F_k=\emptyset$ for $j \neq k$. Then $f^{-1}((t,\infty])$ is the union of the $F_j$ s.t. $c_j > t$, therefore $\mu(f^{-1}(t,\infty])=\sum_{j} \mu(F_{j})\chi_{[0,c_{j})}$. Linearity of the integral implies $\int_{(0,\infty)}\mu(f^{-1}((t,\infty]))d\mathcal{L}^{1}(t)=\int_{(0,\infty)}\sum_{j}\mu(F_{j})\chi_{[0,c_{j})}(t)d\mathcal{L}^{1}=\sum_{j}\mu(F_{j})c{j}=\int_{E}f d\mu $
Step 2: let f be positive. Let $\{\varphi_{k}\}_{k\in\mathbb{N}}$ be a monotone series of integrable functions s.t. $\#\varphi_{k}(E)<\infty$ and $\varphi_{k} \rightarrow f$ pointwise. This series exists because $f^{-1}((0,\infty])$ is $\sigma$-finite (a result that is proven as part of the theory). Monotone convergence (another important result) now implies
$\int_{E}fd\mu=lim_{k\rightarrow \infty}\int_{E} \varphi_{k}d\mu$ [1]
Let $t\in(0,\infty)$, then $\varphi_{k}\leq\varphi_{k+1}$ implies $\{x\mid \varphi_{k}(x)>t\} \subseteq \{x\mid \varphi_{k+1}(x)>t\}$ and pointwise convergence leads to $\{x\mid \varphi_{k}(x)>t\}=\bigcup_{k\in\mathbb{N}}\{x\mid \varphi_{k}(x)>t\}$. Hence $\mu(\{x\mid \varphi_{k}(x)>t\})\rightarrow \mu(\{x\mid f(x)>t\})$ monotonically. Therefore, $\lim_{k\rightarrow \infty}\int_{0,\infty)}\mu(\{x\mid \varphi_{k}(x)>t\})d\mathcal{L}^{1}(t)=\int_{(0,\infty)}\mu(\{x\mid f(x)>t\})d\mathcal{L}^{1}(t)$
The equality immediately follows by applying Step 1 to every $\varphi_{k}$ and equation [1]
Best Answer
Break it into pieces:
$1). S$ is the collection of simple functions such that $s(x)\le f(x)$ for all $x\in X$.
$2).\ I:= \{\int_X s:s\in S\}$ is a set of numbers, with each $\int_X s$ defined as in your question.
$3).\ $ Since $I$ is a set of numbers, we can take $\sup I=\sup\{\int_Xs:s\in S\}.$ The result is a number in the extended reals, and this is the integral of $f$ by definition.