Since $\phi$ is a function (and not a subset of the measure space) we can't really speak of its measure.
Simple functions are sort of "step functions" in the following sense; the sets $\{A_{i}\}_{i=1}^{n}$ form a partition of the measure space $X$ and $\phi$ takes a constant value in each $A_{i}$, i.e. $\phi|_{A_{i}}=\alpha_{i}$ for all $i$. The most economical way of writing this is through the indicator functions of each $A_{i}$, for example by setting $\phi=\sum_{i=1}^{n}\alpha_{i}\chi_{A_{i}}$.
A simple example of a simple-function is the indicator function $\chi_{A}$ of any $A\subset X$: it takes the value $1$ in $A$ and $0$ in the complement $A^{c}$.
The significance of simple-functions is that the measure-integral is defined through them. In fact, one can show that for any non-negative measurable function $f$ there exists a nondecreasing sequence of simple functions $(\phi_{i})$ so that $\phi_{i}\to f$ point-wise.
It sounds like you've solved the problem; if not, I'll help out. For the question concerning Riemann vs. Lebesgue integration, I'll see if I can give you some motivation. Let's just pretend we're working with a smooth nonnegative function $f$ over $\mathbb{R}.$
In Riemann integration, we start by partitioning the $x$-axis, and then we capture the area under the curve by measuring how much of the $y$-axis can fit under the curve above a given element of our partition.
Conversely, with Lebesgue integration, we start to approximate $f$ by approximating the range of $f$ with the $\alpha_i$ used in the definition. In some sense, we're partitioning the $y$-axis into chunks that describe the $y$-behavior of $f$. Then, once we've approximately partitioned the range of $f$, we get a similar notion of area by measuring the $A_i$, which describe the sets of all $x\in \mathbb{R}$ for which the corresponding $\alpha_i$ is a 'good' approximation of $f$. You can see how, in a rough sense, Riemann integration gets area by chopping up $x$ and measuring $y$, and Lebesgue integration gets area by chopping up $y$ and measuring $x$. Only, in the latter case, we have better tools for describing measure.
The standard example is something like $f = \chi_\mathbb{Q}\cap [0,1]$. Obviously, the Riemann upper and lower sums are 1 and 0 respectively, so $f$ is not Riemann integrable, and partitioning the $x$-axis seems unfruitful. On the other hand, if we're using Lebesgue measure, we let $\alpha_0 = 0$ and $\alpha_1 = 1$ and apply your result to get $\int f = 0$. So this $y$-chopping and $x$-measuring lets us handle a wider variety of functions (in general).
The Wikipedia page on Lebesgue integration has a section for motivation/intuition, in case that's helpful.
Best Answer
The concept of the simple function becomes clearer if you take the following definition.
Definition: A simple function is a function of real value such that $Im(f)=\{\alpha_0,\alpha_1, \cdots, \alpha_n\}$.
Now, Let $(X,\Sigma)$ a measurable space, you need to prove the following:
Suppose first that $f$ is a simple measurable function, then we write: \begin{equation} f=\sum_{i=0}^{n}\alpha_{i}\mathcal{X}{f^{-}(\alpha_i)} \end{equation} where $\alpha_i\not=\alpha_j$ if $i\not=j$, this implies that $f^{-}(\alpha_i)\cap f^{-}(\alpha_j)=\emptyset$. And since $f$ is measurable: $$f^{-}(\alpha_i)=\{x\in X: f(x)\geq \alpha_i\}\cap\{x\in X: f(x)\leq \alpha_i\}\in\Sigma$$
Now, suppose that $f$ is written as a finite linear combination of characteristic functions of measurable sets: $$f=\sum_{i=0}^{m}c_{i}\mathcal{X}E_i$$ where $E_i$ are not necessarily disjoint.
If we choose arbitrary x, it has finite possibilities where to place within the intersections of $ E_i $, therefore there will be finite values as output.
So regarding your questions