In what follows
$(\Omega ,\mathcal F, \mathbb P)$ is a probability space,
$X$ is a random variable defined on $\Omega $,
$\mathcal G\subseteq \mathcal F$ is a sub-$\sigma $-algebra generated by a countable partition $\{A_i\}_{i\in I}$
of $\Omega $, formed by $\mathcal F$-measurable sets.
The first thing to bear in mind is that
the conditional expectation $\mathbb E_{\mathbb P}(X|\mathcal G)$ is another random
variable (rather than just a number). Since random variables are nothing but functions defined on $\Omega $,
the notation
$$
\mathbb E_{\mathbb P}(X|\mathcal G)(\omega )
$$
makes perfect sense as it indicates the value taken by the function
$\mathbb E_{\mathbb P}(X|\mathcal G)$ on a given point $\omega $ of $\Omega $.
The second important point is that $\mathbb E_{\mathbb P}(X|\mathcal G)$ is constant on every set $A_i$
(although its constant value may change among the various $A_i$).
By the definition of
conditional expectation, the constant value taken on $A_i$ by $\mathbb E_{\mathbb P}(X|\mathcal G)$ is
the expected value of $X$ on $A_i$, namely
$$
\mathbb E_{\mathbb P}(X|A_i ) := \frac {\mathbb E_{\mathbb P}(X1_{A_i} )}{\mathbb P(A_i)}.
$$
Denoting the above expected value simply by $e_i$, the question is how should we express the function which
is supposed to take on the value $e_i$ on $A_i$. I guess the best way to do this is simply to write
$$
\sum_{i\in I}e_i1_{A_i},
\tag 1
$$
observing that if $\omega $ lies in some $A_j$, then $1_{A_i}(\omega )$ vanishes for every $i$, except for $i=j$, in
which case $1_{A_i}(\omega )=1$, so the above sum comes out as $e_j$, which is precisely what we expect.
Notice that there is no $\omega $ in expression (1) for the same reason many people consider it
incorrect to say
"consider the function $\sin(x)$"
In fact, the function is called simply "$\sin$", whereas "$\sin(x)$" is meant to denote the value
of the function $\sin$ at the given real number $x$.
According to this we therefore have that
$$
\mathbb E_{\mathbb P}(X|\mathcal G)= \sum_{i\in I}e_i1_{A_i},
\tag 2
$$
which, upon substituting the appropriate value for $e_i$, is exactly what the OP writes in the first question.
If we want to explicitly indicate the dependency of these functions on a variable $\omega $, I'd write
$$
\mathbb E_{\mathbb P}(X|\mathcal G) (\omega ) = \sum_{i\in I}e_i1_{A_i}(\omega ), \quad\forall \omega \in \Omega ,
\tag 3
$$
noticing that now $\omega $ shows up on both sides.
On the down side, I believe mixing the LHS of (3) with the RHS of (2) is incorrect so I agree 100%
with the OP that their second formula is preferable over the first one.
Regarding the OP's second question, I see a problem in the sense that
$$
\mathbb{E}_\mathbb{P}(X|\mathcal{G})
$$
is supposed to be a random variable, as already discussed, while
$$
\sum_{i\in I}\mathbb{E}_\mathbb{P}(X|A_i)
$$
strikes me as a number.
Inserting the missing $1_{A_i}$ on the last expression would make it a function, which is more in line
with the functional nature of the conditional expectation.
The sigma algebra generated by $(\frac {j-1} n, \frac j n], 1\leq j \leq n$ is simply the collection of all possible unions of these intervals. [This fact is true for sigma algebra generated by any countable partition of $\Omega$]. To verify that the formula given in the example for $E(X|G)$ works you only have to verify that $\int_A XdP=\int E(X|G)dP$ when $A$ is one of these intervals (because, then, it will hold for all possible unions of these intervals also). But when $A$ is one of these intervals the verification is very simple.
If $X(\omega)=\omega^{2}$ then $E(X|G)$ is the r.v. which has the constant value $\frac 1 3((\frac j n)^{2}-(\frac {j-1} n)^{2})$ on the interval $(\frac {j-1} n, \frac j n]$ for $i \leq j \leq n$.
Best Answer
To begin with, it is worth making some comments. Given a probability space $(\Omega,\mathcal{U},\textbf{P})$, we can think of $\mathcal{U}$ as the information we have at hand related to the random phenomenon we are interested in. More precisely, the $\sigma$-algebra $\mathcal{U}$ tells us what are the events that we can observe the occurrence of. So, when one considers a sub-$\sigma$-algebra $\mathcal{V}\subseteq\mathcal{U}$, we are restricting the information about the random phenomenon we are studying.
Based on such interpretation, we can consider the conditional expectation $\textbf{E}[X\mid\mathcal{V}]$ as the random variable which best approximates $X$ based on the knowledge of $\mathcal{V}\subseteq\mathcal{U}$. This means that $Y := \textbf{E}[X\mid\mathcal{V}]$ should be $\mathcal{V}$-measurable and both $Y$ and $X$ should coincide (in average) at every given measurable set $A\in\mathcal{V}$. That is why $\textbf{E}[X\mid\mathcal{U}]$ equals $X$: the best approximation of $X$ given all the knowledge of $X$ is the random variable $X$ itself.
In order to make it clearer, let us consider the particular case where $Y$ is a simple random variable. This means that we can express $Y$ as a linear combination of indicator functions of measurable sets which partitions the sample space $\Omega$: \begin{align*} Y(\omega) = \sum_{i=1}^{n}y_{i}1_{D_{i}}(\omega) \end{align*}
In such context, if we let that $\mathcal{D}_{Y} = \{D_{1},D_{2},\ldots,D_{n}\}$, then the conditional expectation is given by: \begin{align*} \textbf{E}[X\mid Y](\omega) = \textbf{E}[X \mid \mathcal{D}_{Y}](\omega) = \sum_{i=1}^{n}\textbf{E}[X\mid D_{i}]1_{D_{i}}(\omega) \end{align*}
In other words, we are approximating $X$ by $\textbf{E}[X\mid D_{i}]$ for every $\omega\in D_{i}$. This is not a good approximation, because we are approximating $X$ by a constant at each $D_{i}$, but it is the best approximation among such type of approximations.
Generally speaking, given a probability space $(\Omega,\mathcal{U},\textbf{P})$ where $X$ is $\mathcal{U}$-measurable, $Y$ is $\mathcal{V}$-measurable and $\mathcal{V}\subseteq\mathcal{U}$, we can define the conditional expectation as follows: \begin{align*} \textbf{E}[X\mid Y] = \textbf{E}[X\mid\sigma(Y)] \end{align*} where $\sigma(Y)$ is the $\sigma$-algebra generated by $Y$. Based on such definition, you can recover the usual definition of conditional expectation that you are acquainted to.
Finally, as @OliverDÃaz has mentioned, you can formalize what has been discussed in terms of the best approximation related to the quadratic mean of the difference.