As explained in Gortaur's answer a delta function cannot be the probability density function of a real random variable.
Nevertheless sums of delta functions can be viewed as the "missing link" between discrete and continuous random variables / probability distributions, in the following way:
If $X$ is a discrete random variable taking values $x_k\in{\mathbb R}$ $\ (k\in I$, $\ I$ a countable index set) with probabilities $p_k$ then one can replace the probability space $I$ with the probability space ${\mathbb R}$, provided with the probability measure
$$\mu\ :=\ \sum_{k\in I} p_k \ \delta_{x_k}\ ,$$
where $\delta_x$ denotes a unit point mass at the point $x$. In this way $X$ now has become a real random variable. If $f:\ {\mathbb R}\to {\mathbb R}$ is a reasonable function then the expectation $E\bigl(f(X)\bigr)$ may be written as an integral:
$$E\bigl(f(X)\bigr)\ =\ \int_{-\infty}^\infty f(x)\ d\mu(x)\ .$$
This is a bit late, but I see that the main points in this question have not been completely addressed. I'll set
\begin{equation}
\sigma = 1
\end{equation}
for this answer.
The definition of white noise may be context-dependent: How you define it depends on what you want to do with it. There's nothing inherently wrong with saying that white noise (indexed by a set $T$) is just the process of iid standard normal random variables indexed by $T$, i.e. $E[X(t)X(s)] = \begin{cases} 1 & t = s \\ 0 & t \neq s \end{cases}.$ However, as cardinal noted here, Example 1.2.5 of Kallianpur's text shows that this process is not measurable (as a function of $(t, \omega)$). This is why, as Did commented above, $Y$ is undefined (with this definition of $X$). Thus, this definition of white noise is not appropriate for defining objects like $Y$.
Rather, you want $X$ to have covariance given by the Dirac delta. But the $\delta$ function is not a function but rather a measure and the best context for understanding it is the theory of distributions (or generalized functions---these are not to be confused with "probability distributions"). Likewise, the appropriate context for white noise is the theory of random distributions.
Let's warm up with a heuristic explanation: We'll think of white noise as the "derivative" of Brownian motion: "$dB_t/dt = X_t$". So ignoring rigor for a moment, we could write
\begin{equation}
\int_0^T h(t) X(t) dt = \int_0^T h(t) \frac{dB_t}{dt} dt = \int_0^T h(t) dB_t.
\end{equation}
The reason this isn't rigorous is that Brownian motion is nowhere differentiable. However, the theory of distributions allows us to "differentiate" non-differentiable functions. First of all, a distribution is a linear functional (linear map taking values in the real numbers) on a space of "test functions" (usually smooth functions of compact support). A continuous function $F$ can be viewed as a distribution via the pairing
\begin{equation}
(F, f) = \int_0^\infty F(t) f(t) dt.
\end{equation}
The distributional derivative of $F$ is the distribution $F'$ whose pairing with a test function $f$ is defined by
\begin{equation}
(F', f) = -(F, f').
\end{equation}
Thinking of Brownian motion as a random function, we can define white noise $X$ as its distributional derivative. Thus, $X$ is a random distribution whose pairing with a test function $f$ is the random variable
\begin{equation}
(X, f) = -(B, f') = -\int_0^\infty B(t) f'(t) dt.
\end{equation}
By stochastic integration by parts,
\begin{equation}
(X, f) = \int_0^\infty f(t) dB_t;
\end{equation}
this is the Itô integral of $f$ with respect to $B$.
Now a well-known fact in stochastic calculus is that $M_T = \int_0^T f(t) dB_t$ is a martingale starting at $M_0 = 0$, so $E (X, f) = 0$. Moreover, by the Itô isometry,
\begin{equation}
\mathrm{Var}((X, f)) = E (X, f)^2 = \int_0^\infty f(t)^2 dt.
\end{equation}
It can also be verified that $(X, f)$ is Gaussian.
My main point is that a more appropriate definition of $Y$ might be
\begin{equation}
Y = \int_0^T h(t) dB_t.
\end{equation}
As a last note, because of the way $X$ was defined above, $X_t$ is not defined but $(X, f)$ is. That is, $X$ is a stochastic process but whose index set is given by $T = \{ \text{test functions} \}$ rather than $T = [0, \infty)$. Moreover, again by the Itô isometry,
\begin{equation}
E (X, f) (X, g) = \int_0^\infty f(t) g(t) dt.
\end{equation}
Abandoning rigor again, this becomes
\begin{equation}
E (X, f) (X, g) = \int_0^\infty \int_0^\infty f(s) \delta(s - t) g(t) ds dt
\end{equation}
and it is in this sense that the covariance of $X$ is the Dirac delta.
Edit: Note that we could leave the definition of $(X, f)$ in terms of the ordinary integral and do all the above calculations using Fubini's theorem and (ordinary) integration by parts (it's just a bit messier).
Best Answer
The variance $$ E(X^2)-E(X)^2=E(X-E(X))^2 $$ is equal to $0$ if and only if $X$ is equal to $E(X)$ in all of its support. This can only happen if $X$ is equal to some constant with probability $1$ (known as a degenerate distribution).