R – How to Simulate Standard Deviation in R

rsample-sizesimulationstandard deviation

I would like to simulate data based on real data captured. The real data captured is 15 observations. The simulation based on the existing data is 100 observations. I have a mean and standard deviation for the 15 observations, however how do I simulate standard deviation for a larger sample (100 observations) based on the smaller real data? Standard deviation should generally decrease with an increase in sample size, but at what rate?

Best Answer

Standard error decreases as the sample size increases. Standard deviation is a related concept but perhaps not related enough to warrant such similar terminology that confuses everyone who is starting to learn statistics.

A sampling distribution is the distribution of values you would get if you repeatedly sampled from a population and calculated some statistic, say the mean, each time. The standard deviation of that sampling distribution is the standard error. For the standard error of the mean, it decreases by $\sqrt{n}$, so $s/\sqrt{n}$ as an estimate of the standard error (where $s$ is the sample standard deviation).

The standard deviation of a distribution is whatever it is, and it doesn’t care how large a sample you draw or if you even sample at all.

It sounds like you want to simulate data from a distribution with the mean and standard deviation you’ve calculated from the sample of $15$, so do that. If you’re willing to assume a normal distribution, the R command is rnorm and the Python command is numpy.random.normal.