Remember $U$ is bounded, so $W^{1,n}(U) \subset W^{1,p}(U)$ for all $1 \leq p < n$ by Holder. Since $p^*\to \infty$ as $p\to n$, we can choose a fixed $p<n$ close enough to $n$ so that $p^*>n$. Then by the Rellich-Kondrachov compactness thoerem
$$W^{1,n}(U) \subset W^{1,p}(U) \subset\subset L^n(U).$$
Arguing along the same lines, we actually have $W^{1,n}(U) \subset\subset L^q(U)$ for all $1 \leq q < \infty$.
It's really a definition following from the inclusion $L^2(U) \subset H^{-1}(U)$. Perhaps it is better to write out the inclusion in more detail:
$$L^2(U) \sim (L^2(U))^* \subset (H^1_0(U))^* =: H^{-1}(U).$$
The inclusion above is true because the inclusion $H^1_0(U) \subset L^2(U)$ is continuous, i.e., $\|u\|_{L^2(U)} \leq \|u\|_{H^1_0(U)}$. So when we write $v \in L^2(U) \subset H^{-1}(U)$, what we really mean is that we are associating $v$ with the bounded linear functional on $L^2(U)$ given by
$$u \mapsto (v,u)_{L^2(U)} \ \text{for } u \in L^2(U),$$
which is also a bounded linear functional on $H^1_0(U)$ given by the restriction
$$u \mapsto (v,u)_{L^2(U)} \ \text{for } u \in H^1_0(U).$$
This is why we can write $\langle v,u\rangle = (v,u)_{L^2(U)}$ when $v \in L^2(U) \subset H^{-1}(U)$.
This is at least the canonical way to embed $L^2(U) \subset H^{-1}(U)$. We could, for instance, define $\langle v,u\rangle := 2(v,u)_{L^2(U)}$ for $v \in L^2(U)$ and $u \in H^1_0(U)$. This defines a bounded linear functional on $H^1_0(U)$, but is not canonical in the sense I described above.
Best Answer
In Evans' book, $\eta_\varepsilon$ is the standard mollifiers, and in my edition is defined/discussed in appendix C.4.
Here was my thought process in thinking about your question:
Looking at the definition in the appendix, $$f^\varepsilon(x)=(f*\eta_\varepsilon)(x)=\int_{B(0,\varepsilon)}\eta_\varepsilon(y)f(x-y)dy=\int_{B(x,\varepsilon)}\eta_\varepsilon(x-y)f(y)dy.$$ We are essentially taking the mollifier $\eta_\varepsilon$, whose support sits inside the ball $B(0,\varepsilon)$, and moving it around $U$ so that we can weight $f$ by it at each point $x$ with the goal of smoothing out $f$. Thus in order for the integrad $\eta_\varepsilon(y)f(x-y)$ to even make sense, the ball sitting at $x$ has to be completely inside $U$. This is motivation for the notation/definition $U_\varepsilon=\{x\in U \ | \ \mathrm{dist}(x,\partial U)<\varepsilon\}$, and why the (basically) direct proof of Theorem 1 only gives local approximation, and why "dealing with the boundary" is the main goal of this result.
Thus, the convolutions $u^i$ are defined for any $x\in U_{\varepsilon_i}$. Theorem 1 then implies that, since $\zeta_iu\in W^{k,p}(U)$, we have a local approximation of smooth functions, but since $\mathrm{spt}(\zeta_iu)\subset V_i\subset\subset U$, we may presumably take the $\varepsilon_i$'s small enough that the convolution is defined on all of $V_i$ so we may drop the "$\mathrm{loc}$" from the approximation. This gives us the sequence $u^i$ satisfying $(3)$ without loss of generality.
Hope this helped clarify the proof!