Hints Since $(X_n)_n$ is convergent in distribution (to some random variable $X$), the family of distributions $\{\mathbb{P}_{X_n}; n \in \mathbb{N}\}$ is tight. From this one can conclude that the sequences $(\mu_n)_n$ and $(\sigma_n^2)_n$ are bounded (see this post). Thus, there exist convergent subsequences, i.e. $$\mu := \lim_{k \to \infty} \mu_{n_k} \qquad \qquad \sigma^2 := \lim_{k \to \infty} \sigma_{n_k}^2 \tag{1}$$
Moreover, the weak convergence of $(X_n)_n$ implies the convergence of the characteristic functions, $$\mathbb{E}\exp \left( \imath \, \xi \cdot X_n \right) \to \mathbb{E}\exp \left( \imath \, \xi \cdot X \right) \qquad (n \to \infty)$$
Using the explicit formula for characteristic functions of Gaussian random variables and $(1)$, we conclude $X \sim N(\mu,\sigma^2)$.
This behaviour is an example of conditional convergence. A perhaps simpler example to start with is $$ \int_{-1}^1 \frac{\mathrm{d}x}{x} \text{.}$$ This integrand is going to infinity as it approaches $0$ form the right an negative infinity as it approaches zero from the left. If we successively approximate this integral as $$ \int_{-1}^{-M} \frac{\mathrm{d}x}{x} + \int_{M}^{1} \frac{\mathrm{d}x}{x} \text{,} $$ we have arranged to carefully cancel the positive and negative contributions. No matter what positive value we pick for $M$, these two integrals are finite and their values cancel. But this is not how we evaluate improper integrals. (An integral that only gave useful answers if you sneak up on infinities in just the right way is too treacherous to trust. Taking this very special way of sneaking up on the infinities gives the Cauchy principal value.) Instead, we require that the two integrals can squeeze in towards zero independently and that regardless of how they do so we get the same answer. Contrast the above with $$ \int_{-1}^{-M} \frac{\mathrm{d}x}{x} + \int_{M/2}^{1} \frac{\mathrm{d}x}{x} \text{,} $$ which is always positive (because we include more of the positive part than of the negative part) and diverges to infinity. We can arrange to diverge to negative infinity by moving the "${}/2$" to the other integral.
The same thing is happening in your integral for the mean, $$
\int_{-\infty}^{\infty} y f(y) \,\mathrm{d}y
$$ because for very large positive $y$, $1/y - 1 \approx -1$, so this integrand is approximately $1/y$ and similarly for very large negative $y$ where the integrand is approximately $1/y$. If you restrict your integral to be symmetric around $1$, $\int_{1-M}^{1+M} \dots $, these two tails will cancel (not exactly, but the cancellation improves the farther away from $1$ you go). However, if you use $\int_{1-M}^{1+2M} \dots $, they won't cancel as well and the uncancelled part will grow asymptotically like $\log M$.
This slow growth of the discrepancy has a consequence for numerical experiments. You will have to take many samples at very large values of $y$ and very carefully sum them so that the contributions from large $y$s aren't swamped by rounding the values from small $y$s. Your plot shows the integrand for the mean, $y f(y)$ is about $3$ for $y=1$. It's about $10^{-23}$ for $y \approx 100$. This means to retain necessary precision just over the range $y \in [-100,100]$, you will have to numerically integrate with at least $23$ digits of precision (plus several guard digits). Even more numerically difficult, because of the logarithmic rate of divergence of the right-hand and left-hand integrals, you will have to integrate over stupendous ranges of $y$ for any not carefully balanced cancellation to accumulate measurably. For instance, \begin{align}
\int_{-10}^{-9} y f(y) \,\mathrm{d}y + \int_{9}^{10} y f(y) \,\mathrm{d}y &\approx 1.84\dots \times 10^{-18} \\
\int_{-100}^{-9} y f(y) \,\mathrm{d}y + \int_{9}^{100} y f(y) \,\mathrm{d}y &\approx 3.13\dots \times 10^{-18} \text{.}
\end{align}
(I've ignored the contribution from $-9$ to $9$ in both integrals so that the result isn't swamped by the $18$ orders of magnitude larger contribution from that region.) These will always be positive because the contribution from the positive $y$s is closer to the peak at $y=1$ than is the contribution from the negative $y$s.
So, in your simulations, were you sampling over truly stupendous ranges of $y$s so that you might observe the divergence? And, while doing so, were you keeping truly stupendous precision so that the very slow divergence wouldn't be swamped by rounding errors in the contribution from $y$s near $1$? Doing both of these things is computationally very challenging. Once you have that ironed out, you'll need to take the limits to infinity in different ways to avoid calculating some sort of principal value and when you do this, you'll discover that the integrals do not converge and your estimates are not stable. For instance \begin{align}
\int_{1- 2 \cdot 10}^{1-9} y f(y) \,\mathrm{d}y + \int_{1+9}^{1+1 \cdot 10} y f(y) \,\mathrm{d}y &\approx 6.644\dots \times 10^{-19} \\
\int_{1- 2 \cdot 100}^{1-9} y f(y) \,\mathrm{d}y + \int_{1+9}^{1+1 \cdot 100} y f(y) \,\mathrm{d}y &\approx 1.292\dots \times 10^{-18}
\end{align} and each time we increment the power of $10$ used for the outermost endpoints, the discrepancy will approximately double. Once we reach $10^{53}$-ish, the discrepancy will finally be as large as the contribution from $y \in [1-9,1+9]$.
To sum up, numerically detecting the divergence of this integral will require some very careful programming and numerical analysis. (Alternatively, you can do what I did for the estiamtes above and use a big symbolic and numerics application like Maple or Mathematica.)
Best Answer
There is no way to produce a small random sample from a Normal distribution that is guaranteed to have a sample mean and sample variance within $0.01\%$ of the population mean and population variance.
The key insight here is that even if you have a method of producing perfect random samples from a standard Normal distribution, the samples will not have the same mean and variance as the underlying distribution of the population. In other words, if you are really sampling from a Normal distribution you expect variability in your sample mean and sample variance, for a given sample size. Finding a way to get rid of that variability would force you to either produce something that is not a random sample or did not come from a Normal distribution.
The error in Matlab's
randn
comes from (1) machine precision limitations inherent to any finite computation and (2) the fact that it uses a pseudo random number generator. Both of these limitations are inherent to any computational implementation that produces samples from a probability distribution in a repeatable way. MatlabR2016A
and other modern versions default to the Mersenne Twister method, which produces high quality pseudo random numbers. The error from the generator and the machine precision is much less than $0.01\%$.Almost all of the difference between your sample mean and the population mean comes from the randomness in the Normal distribution itself, so it's best to think of that difference as variability rather than error.
If you need a small set of numbers that has sample mean and sample variance close to the population values, you could for example take $\{-1.32,-0.54,0.56,1.3\}$. This set of numbers has $(\frac{1}{N} \sum_{n=1}^N X_n) = 0$ and $(\frac{1}{N} \sum_{n=1}^N X_n^2 - (\frac{1}{N} \sum_{n=1}^N X_n)^2) = 1.0094$ $^*$. However, I explicitly chose these numbers - they were not randomly sampled from a Normal distribution.
The main conclusion is that the variability in the sample is a good thing because it represents the true variability in samples from a Normal distribution. If you're using the normal random numbers as an input to further simulation, consider variance reduction techniques to address this problem while maintaining rigor in your random sampling.
$^*$ I'm using your formula for variance but you should carefully consider which sample variance you want to use.