This is a bit late, but I see that the main points in this question have not been completely addressed. I'll set
\begin{equation}
\sigma = 1
\end{equation}
for this answer.
The definition of white noise may be context-dependent: How you define it depends on what you want to do with it. There's nothing inherently wrong with saying that white noise (indexed by a set $T$) is just the process of iid standard normal random variables indexed by $T$, i.e. $E[X(t)X(s)] = \begin{cases} 1 & t = s \\ 0 & t \neq s \end{cases}.$ However, as cardinal noted here, Example 1.2.5 of Kallianpur's text shows that this process is not measurable (as a function of $(t, \omega)$). This is why, as Did commented above, $Y$ is undefined (with this definition of $X$). Thus, this definition of white noise is not appropriate for defining objects like $Y$.
Rather, you want $X$ to have covariance given by the Dirac delta. But the $\delta$ function is not a function but rather a measure and the best context for understanding it is the theory of distributions (or generalized functions---these are not to be confused with "probability distributions"). Likewise, the appropriate context for white noise is the theory of random distributions.
Let's warm up with a heuristic explanation: We'll think of white noise as the "derivative" of Brownian motion: "$dB_t/dt = X_t$". So ignoring rigor for a moment, we could write
\begin{equation}
\int_0^T h(t) X(t) dt = \int_0^T h(t) \frac{dB_t}{dt} dt = \int_0^T h(t) dB_t.
\end{equation}
The reason this isn't rigorous is that Brownian motion is nowhere differentiable. However, the theory of distributions allows us to "differentiate" non-differentiable functions. First of all, a distribution is a linear functional (linear map taking values in the real numbers) on a space of "test functions" (usually smooth functions of compact support). A continuous function $F$ can be viewed as a distribution via the pairing
\begin{equation}
(F, f) = \int_0^\infty F(t) f(t) dt.
\end{equation}
The distributional derivative of $F$ is the distribution $F'$ whose pairing with a test function $f$ is defined by
\begin{equation}
(F', f) = -(F, f').
\end{equation}
Thinking of Brownian motion as a random function, we can define white noise $X$ as its distributional derivative. Thus, $X$ is a random distribution whose pairing with a test function $f$ is the random variable
\begin{equation}
(X, f) = -(B, f') = -\int_0^\infty B(t) f'(t) dt.
\end{equation}
By stochastic integration by parts,
\begin{equation}
(X, f) = \int_0^\infty f(t) dB_t;
\end{equation}
this is the Itô integral of $f$ with respect to $B$.
Now a well-known fact in stochastic calculus is that $M_T = \int_0^T f(t) dB_t$ is a martingale starting at $M_0 = 0$, so $E (X, f) = 0$. Moreover, by the Itô isometry,
\begin{equation}
\mathrm{Var}((X, f)) = E (X, f)^2 = \int_0^\infty f(t)^2 dt.
\end{equation}
It can also be verified that $(X, f)$ is Gaussian.
My main point is that a more appropriate definition of $Y$ might be
\begin{equation}
Y = \int_0^T h(t) dB_t.
\end{equation}
As a last note, because of the way $X$ was defined above, $X_t$ is not defined but $(X, f)$ is. That is, $X$ is a stochastic process but whose index set is given by $T = \{ \text{test functions} \}$ rather than $T = [0, \infty)$. Moreover, again by the Itô isometry,
\begin{equation}
E (X, f) (X, g) = \int_0^\infty f(t) g(t) dt.
\end{equation}
Abandoning rigor again, this becomes
\begin{equation}
E (X, f) (X, g) = \int_0^\infty \int_0^\infty f(s) \delta(s - t) g(t) ds dt
\end{equation}
and it is in this sense that the covariance of $X$ is the Dirac delta.
Edit: Note that we could leave the definition of $(X, f)$ in terms of the ordinary integral and do all the above calculations using Fubini's theorem and (ordinary) integration by parts (it's just a bit messier).
This is not true. Limiting to $\Big([0,1], \mathcal{B}, \lambda \Big)$ is crucial in cited problem.
On the other hand, from Kolmogorov existence theorem it is straight forward to check that:
If $\ (\mu_{t})_{t \in T} \ $ is any family of distributions on $\ \mathbb{R}, \ $ then there exists an independent family $\ (X_{t})_{t\in T} \ $ of such random variables, that $\ X_{t} \ $ is distributed according to $\ \mu_{t} \ $ for all $\ t\in T. \ $
In the above statement $\ T \ $ is an interval contained in $\ \mathbb{R}_{+}. \ $
By taking $\ \mu_{t}=\mathcal{N}(0,1) \ $ for all $ \ t\in \mathbb{R}_{+} \ $ we see that:
- $(X_{t})_{t\in \mathbb{R}_{+}}$ is gaussian.
- $\mathrm{Cov}(X_{t},X_{s})=0 \ $ for all $\ s\not= t.$
This approach has nothing to do with generalized functions.
Best Answer
The disparity arises from the fact that your discretization of the continuous process does not assign the appropriate variance to the $X_i$. Here's the key (for heuristics, see here, Section 3.2):
(The power spectral density of the i.i.d. sequence approaches that of the continuous process that it simulates -- flat and equal to the same constant value $\sigma^2$ -- except that for the i.i.d. sequence it is "band limited", i.e. vanishing outside of a finite-width interval.)
Thus, in the present case, to simulate $$I = \int_0^L X(t) \, dt \approx \sum_{i=1}^N X(i\Delta)\,\Delta $$ where $\Delta=\frac{L}{N}$, one would use $$\hat{I} = \Delta\sum_{i=1}^N X_i $$ with $X_1,X_2,\ldots$ i.i.d. $\text{Gaussian}(\text{mean}=\mu,\text{variance}=\frac{\sigma^2}{\Delta})$. Then $$\begin{align} \text{Var}[\hat{I}] = \Delta^2\,N\,\frac{\sigma^2}{\Delta} = (N\Delta)\sigma^2 = L\sigma^2. \end{align} $$