Answer a.1: Because $f = f^+ - f^-$ where $f^+ , f^- \geq 0$ and so one only needs the integral on non-negative functions.
Okay, but why exactly does it suffice to deal with $f^+$ and $f^-$ separately? The triangle inequality should be mentioned: for a given $\varepsilon \gt 0$ choose $s^+$ and $s^-$ such that $\|f^\pm-s^\pm\| \lt \varepsilon/2$ and then
$$\|f-(s^+-s^-)\| \leq \|f^+-s^+\|+\|f^--s^-\|\lt \varepsilon$$
by the triangle inequality and the choice of $s^\pm$.
Answer a.2: To prove that one uses that for any non-negative $f$ there exists $s_n$ such that $$\forall \varepsilon > 0 \exists n_0 : n > n_0 \implies |f(x) - s_n(x) | < \frac{\varepsilon}{\mu(\Omega)} \forall x \in \Omega$$ where $s_n \leq s_{n+1}$ and $\displaystyle s_n(x) = \sum_{i=1}^N \alpha_i \chi_{Y_{i,n}}(x)$ and $Y_n := \cup Y_{i,n}$.
Here's a better and more direct way to do it (you don't say how to do it, after all!). Note that the following argument doesn't use $\sigma$-compactness of $\Omega$.
The idea (the basic idea of all of Lebesgue's integration theory!) is to slice the set of values of $f$ into fine strips of height $2^{-n}$:
Assume $f \geq 0$ and put $A_{k,n} = \{x\in \Omega : 2^{-n} k \leq f(x) \lt 2^{-n}(k+1)\}$ and consider $s_n = 2^{-n} \sum\limits_{k=0}^{2^{2n}}k \cdot[A_{k,n}]$. Then $s_n$ approximates $f$ on $\{x \in \Omega\,:\,0 \leq f \lt 2^n + 2^{-n}\}$ up to a precision of $2^{-n}$ pointwise.
Thus $s_n \nearrow f$ pointwise a.e. and since $f \in L^1$ we have $s_n \in L^1$, in particular $s_n$ is supported on a set of finite measure.
By the dominated convergence theorem, then, we have $s_n \to f$ in $L^1$: note that $\|f-s_n\| = \int f-s_n$ and $0 \leq f-s_n \leq f$ while $s_n \to f$ pointwise a.e.
But, as was stated in the exercise you may assume this fact, so there would be no need to spell it out.
Here I think I need to assume $\mu(\Omega) < \infty$ but maybe that follows from what is given in the question.
The space $\mathbb{R}$ with Lebesgue measure is $\sigma$-compact (it is the union of the countably many compact intervals $[-n,n]$, for example) but it is certainly not of finite measure, so no, this does not follow from the assumptions.
Now one has a measurable function with finite (and therefore compact) support.
What? I hope this is a typo. Finite measure certainly doesn't imply compactness, not even boundedness: the set $\bigcup_{n=1}^{\infty} (n, n+2^{-n})$ has Lebesgue measure $1$ is open and unbounded, so...
Next one wants to make that into a continuous function.
b) Recall the Theorem of Lusin and Tietze’s Extension Theorem:
Theorem (Lusin’s Theorem). Let $\Omega$ and $\mu$ as above and $f\colon\Omega\to\mathbb R$ a $\mu$-measurable function with finite support $E$. Then for any $\delta > 0$ there exists a closed set $K\subset E$ such that $μ(E\setminus K) < \delta$ and $f$ is continuous on $K$.
Added: Note that "finite support" should read "support of finite measure" here.
Theorem (Tietze’s Extension Theorem for LCH spaces). If $\Omega$ is a locally compact Hausdorff space and $K \subset \Omega$ compact then any $f ∈ C(K, \mathbb R)$ can be extended to a function on $C_c(\Omega,\mathbb R)$ with supremums norm bounded by $|| f ||_\infty$.
Combine these theorems to show that $C_c(\Omega)$ is dense in $L_1$. You may need that $\mu$ is Radon.
Well, it is certainly a good idea to recall Lusin's theorem and Tietze's extension theorem from time to time, but they are actually not needed here.
We have already shown that each integrable $f \geq 0$ can be approximated in the $L^1$-norm by simple functions with finite support. It remains to show that a characteristic function can be approximated by continuous functions. I'm not spelling the reduction to that fact out, because I should leave something to you.
So let $A \subset \Omega$ be a set of finite measure. Since $\mu$ is a Radon measure on a $\sigma$-compact space (hence it is inner and outer regular on Borel sets of finite measure), we can find a compact subset $K \subset A$ and an open set $U \supset A$ such that $\mu(U \setminus K) \lt \varepsilon$. By Urysohn's lemma we can find a continuous function of compact support $g$ such that $0 \leq g \leq 1$, $g = 1$ on $K$ and $g =0$ outside $U$. This gives that $\int |[A] - g| \leq \mu(U \setminus K) \lt \varepsilon$, hence every characteristic function can be approximated arbitrarily well by continuous functions of compact support.
Answer b:
From a) one has a measurable function $s_n$ with finite support $Y_n$. To make it into a continuous function one can set it to be $0$ on $Y_n \backslash K$ where $K$ is a closed set such that $\mu(Y_n \backslash K) < \delta$ for some $\delta$. Let's call this modified function $\tilde{s_n}$. Then $$ \lim_{n \rightarrow \infty} || \tilde{s_n}(x) - f(x)||_1 = \lim_{n \rightarrow \infty} \int_{Y_n \backslash K} |s_n(x) - f(x)| d\mu = \lim_{n \rightarrow \infty} \int_{Y_n} |s_n(x) - f(x)| d\mu = 0$$
I don't understand this argument at all, I'm afraid. What exactly is $\tilde{s}_n$ and how exactly does that limiting argument work?
Here's a suggestion: use Lusin's theorem to find a closed set of finite measure $K$ such that $\mu(E\setminus K) \lt \delta$ on which $s_n$ is continuous. Use inner regularity of $\mu$ to find a compact set $C \subset K$ with $\mu(C \subset K) \lt \delta$. Apply Tietze's extension theorem to extend the restriction $s_n|_{C}$ to a continuous function of compact support $\tilde{s}_{n}$,close to $s_n$ in the $L^1$-norm.
We can ignore the information that $J_\varepsilon$ is a mollifier. All we need is a smooth function with integral one. $J_\varepsilon$ is such a function as proven in a) in the question above.
We will use that $C_c(X)$ is dense in $L^1$ to show that $C_c^\infty(X)$ is also dense in $L^1$ where $X$ is an open subset of $\mathbb{R}$. Let $\epsilon > 0$ and $f \in L^1$. Then by density of $C_c(X)$ there is a $g$ in $C_c(X)$ such that $\| f - g \|_{L^1} < \epsilon$.
Now we need to turn $g$ into a smooth function by convolving it with $J_\varepsilon$. Let $$g_\varepsilon (x) := (J_\varepsilon \ast g ) (x) = \int_\mathbb{R} J_\varepsilon(x - y) g(y) dy$$
Then $g_\varepsilon$ is smooth because $\left ( f \ast g \right )^\prime = f^\prime \ast g = f \ast g^\prime$ and $J_\varepsilon$ is infinitely differentiable.
$g_\varepsilon$ has compact support because if $[-S,S]$ is the support of $g$ and $[-R,R]$ is the support of $J_\varepsilon$ then the support of $J_\varepsilon \ast g$ is contained in $[-S - R, S + R]$ and hence is also compact.
To finish the proof we claim that $\| f - g_\varepsilon \|_{L^1} < \epsilon$:
$$ \| f - g_\varepsilon \| \leq \| f - g \| + \|g - g_\varepsilon \| < \epsilon$$
Where $\| f - g \| < \frac{\epsilon}{2}$ holds because $C_c(X)$ is dense in $L^1$ and $\|g - g_\varepsilon \| < \frac{\epsilon}{2}$ holds because:
$$\begin{align}
\|g - g_\varepsilon \|_{L^1} = \int_X \left | g(z) - g_\varepsilon (z)\right | dz
&= \int_X \left | g(z) - \int_\mathbb{R} J_\varepsilon(z -y) g(y) dy \right | dz \\
&= \int_X \left | g(z)\int_\mathbb{R}J_\varepsilon(y)dy - \int_\mathbb{R}J_\varepsilon(z -y) g(y) dy \right | dz\\
&\stackrel{(*)}{=} \int_X \left | g(z)\int_\mathbb{R}J_\varepsilon(z - y)dy - \int_\mathbb{R}J_\varepsilon(z -y) g(y) dy \right | dz \\
&= \int_X \left | \int_\mathbb{R} g(z) J_\varepsilon(z - y)dy - \int_\mathbb{R}J_\varepsilon(z -y) g(y) dy \right | dz \\
&\leq \int_X \int_\mathbb{R} | g(z) J_\varepsilon(z - y) |dy - \int_\mathbb{R} | J_\varepsilon(z -y) g(y) | dy dz \\
&= \int_X \int_\mathbb{R} |g(z) - g(y)| J_\varepsilon (z -y) dy dz
\end{align}$$
Where the equality marked with (*) holds because the integral is over all of $\mathbb{R}$ so the shift by the constant $z$ doesn't change the integral and $J_\varepsilon$ is even hence $J_\varepsilon (y) = J_\varepsilon (-y)$.
$g$ is continuous and compactly supported hence it is uniformly continuous and so there exists a $\delta$ such that $|g(z) - g(y)| < \frac{\epsilon}{2 \lambda(X)}$ for all $z,y \in X$ hence by choosing $\varepsilon := \delta$ we get
$$ \int_X \int_\mathbb{R} |g(z) - g(y)| J_\delta (z -y) dy dz < \frac{\epsilon}{2} $$
Note that $\epsilon$ and $\varepsilon$ are not the same.
Best Answer
Another approach uses the representation of the dual $\left[ L^1(\mathbb{R})\right]^\star$ as $L^\infty(\mathbb{R})$ and the Hahn-Banach separation theorem. Namely, to prove that a vector subspace of a Banach space is dense we only need to show that the only continuous linear functional that vanishes on it is the null one.
To do this fix $\phi \in L^\infty(\mathbb{R})$ and suppose that $$\tag{1}\int_{-\infty}^\infty \phi(x)f(x)\, dx=0, \quad \forall f\in C_c(\mathbb{R}).$$ We claim that $\phi=0$ almost everywhere. Indeed, let $a<b$ be fixed numbers. Approximate the characteristic function $\chi_{[a,b]}$ with a family $\chi^{(\varepsilon)}_{[a, b]}$ of "trapezoidal-like" functions:
We have $$\int_{-\infty}^\infty \left\lvert \chi_{[a,b]}(x)-\chi_{[a,b]}^{(\varepsilon)}\right\rvert\, dx = 2\varepsilon$$ so \begin{align} \left\lvert \int_{-\infty}^\infty \phi(x)\chi_{[a,b]}(x)\, dx - \int_{-\infty}^\infty \phi(x)\chi_{[a, b]}^{(\varepsilon)}(x)\, dx \right\rvert & \le \lVert \phi\rVert_{\infty} \int_{-\infty}^\infty \left\lvert \chi_{[a,b]}(x)-\chi_{[a,b]}^{(\varepsilon)}(x)\right\rvert\, dx \\ &=2\varepsilon \lVert \phi\rVert_\infty. \end{align} In particular, $$\int_a^b \phi(x)\, dx=\lim_{\varepsilon \to 0} \int_{-\infty}^\infty \phi(x)\chi_{[a,b]}^{(\varepsilon)}\, dx,$$ and the last limit is $0$ due to our assumption (1): indeed, every $\chi_{[a, b]}^{(\varepsilon)}$ is a continuous function with compact support. We have thus shown that $$\tag{2} \int_a^b\phi(x)\, dx=0, \qquad \forall a<b.$$ It is intuitively clear that this can happen only if $\phi=0$ almost everywhere: for a rigorous proof of this you can apply the Lebesgue differentiation theorem or Lemma 1 of this post. This proves the claim.
To conclude we only need to recall that every continuous linear functional $\Lambda \in \left[ L^1(\mathbb{R})\right]^\star$ is of the form $$\Lambda f= \int_{-\infty}^\infty \phi(x)f(x)\, dx,\qquad f \in L^1(\mathbb{R}),$$ for a unique $\phi\in L^\infty(\mathbb{R})$, and then apply the Hahn-Banach separation theorem. $\square$
A final remark: Even if we did not mention convolutions explicitly, the present proof is not that different in nature from the ones presented above. Both rely on the possibility of approximating "rough" functions (like our $\chi_{[a, b]}$) with "smooth" ones.