MATLAB: Simple question about Standard Deviation.

standard deviationstatistics

I have a number of data points, lets say in a vector v, and lets say there are "num" of them. If I write sd = std(v) did it assume a sample i.e. it used num-1 (in the denominator) or did I get a population standard dev i.e. it used num? How can I request one or the other?

Best Answer

By default, it will give the sample standard deviation. Call it as

std(x,1)

to get the population. That is explained in the documentation for std, in the section describing the input argument weight.

Related Solutions

MATLAB: Standard deviation of values with standard deviation

The question you are asking is a specific instance of "propagation of uncertainty". You might want to learn more about the topic.

For this specific case, I believe the answer is

std_sum = sqrt(1^2 + 2^2 + 3^2); % error of sum = square root of (sum of squares of individual uncertainties)
std_mean = std_sum/3
std_mean = 1.2472

Here is a simulation that suggests that this is correct:

rng default
N = 100000;
x1 = 10 + 1*randn(N,1);
x2 = 15 + 2*randn(N,1);
x3 = 20 + 3*randn(N,1);
x = [x1, x2, x3];
mean_x = mean(x,2);
mean(mean_x)
ans = 15.0007
std(mean_x,1)
ans = 1.2459
figure
subplot(4,1,1), histogram(x(:,1))
xlim([0 30])
subplot(4,1,2), histogram(x(:,2))
xlim([0 30])
subplot(4,1,3), histogram(x(:,3))
xlim([0 30])
subplot(4,1,4), histogram(mean_x)
xlim([0 30])

I would encourage you to be really careful about your terminology. Often when people write x +/- dx, dx refers to the standard error, not the standard deviation. The standard deviation is a property of the parent population. The standard error (of the mean, for example) will depend on the sampling -- specifically on sample size.

I think that what I have written above probably corresponds to what you wanted to know, but be careful.

MATLAB: Another simple question about standard deviation!

std automatically assumes you are doing this for a complete population, NOT as a sample. When you are not sure about something, the best way is to test it! So, how can we test my claim? How might you have done so, and gotten an answer 6 hours earlier?

X = rand(1,5);

First, what does std do, with no weights employed?

std(X)
ans =
      0.32851
std(X,1)
ans =
      0.29383

So as one should expect, the two are different, by a ratio of

sqrt(5)/2
ans =
        1.118
std(X)/std(X,1)
ans =
        1.118

That is as expected. std(X) divides by sqrt(n-1) in the formula, but std(X,1) divides by sqrt(n).

Now, lets see what happens when we use weights. A very simple weight vector is sufficient here.

W = ones(1,5);
std(X,W)
ans =
      0.29383

This is the population standard deviation, as produced by std(X,1).

std(X,1)
ans =
      0.29383

The point is, it makes no sense at all to talk about a sample standard deviation when you have weights. Well, relatively little sense. Given a set of weights, we can only interpret this as the entire population.

I will concede that the documentation (both doc and help) for std should have made this fact explicitly clear, even though it seems clear to me regardless, since the alternative makes no sense. If you have weights, the points are treated as a complete population.

std(X,W,0)
Error using size
Dimension argument must be a positive integer scalar within indexing range.
Error in var (line 109)
n = size(x,dim);
Error in std (line 51)
y = sqrt(var(varargin{:})); 
std(X,0,W)
Error using size
Dimension argument must be a positive integer scalar within indexing range.
Error in var (line 109)
n = size(x,dim);
Error in std (line 51)
y = sqrt(var(varargin{:}));

Yep. std agrees with me.

Best Answer

Related Solutions

MATLAB: Standard deviation of values with standard deviation

MATLAB: Another simple question about standard deviation!

Related Question