MATLAB: Some insight on mvnrnd function

MATLABmvnrndrandom number generatorstatistics

Hi, I am trying to simulate a simple bivariate VAR(1) process with Gaussian errors and I use mvnrnd function to draw from a multivariate normal with mean [0;0] and variance matrix Σ.

I was creating my errors outside of my for loop, but I realised the two approaches below give completely different results:

mu = [0;0]; sgm = [1 0.7; 0.7 1]; iter = 100;
rng(1)
E = mvnrnd(mu, sgm, iter);
E_e = zeros(iter,2);
rng(1)
for i = 1:iter
    E_e(i,:) = mvnrnd(mu, sgm);
end

Can you give me some insight on why these two are different – I have some ideas but cannot formalise my thoughts.

Best Answer

For once, someone asks a good question, that says they saw something they did not understand, and want to understand what happened. :)

(Don't feel bad, I had to think for a second myself about it.) So what did happen? The answer is to change what you did, to a simpler problem. You had:

mu = [0;0]; sgm = [1 0.7; 0.7 1];

But let me changes sgm to an identity.

sgm = eye(2);
rng(1)
mvnrnd(mu, sgm,3)
ans =
     -0.64901      -1.1096
       1.1812     -0.84555
     -0.75845     -0.57266

So 3 sets of normal events, as rows of that matrix. A standard normal, so every one comes from a normal N(0,1) distribution.

But now lets generate them one set at a time. Look carefully at the sequence.

>> rng(1)
>> mvnrnd(mu, sgm)
ans =
     -0.64901       1.1812
>> mvnrnd(mu, sgm)
ans =
     -0.75845      -1.1096
>> mvnrnd(mu, sgm)
ans =
     -0.84555     -0.57266

Do you see that MATLAB generates the same set of numbers, but in a different sequence.

>> rng(1)
>> mvnrnd(0,1,6)
ans =
     -0.64901
       1.1812
     -0.75845
      -1.1096
     -0.84555
     -0.57266

The same 6 numbers. Now, think about how MATLAB stores numbers in an array. It goes down the columns.

>> reshape(1:6,[3,2])
ans =
     1     4
     2     5
     3     6

So, when you generate the sets one pair at a time, it gives you the same sequence, but ONE pair at a time. Consider the difference between these next two calls, then look back at what you saw when you generated them one pair at a time.

>> rng(1)
>> reshape(mvnrnd(0,1,6),[3,2])
ans =
     -0.64901      -1.1096
       1.1812     -0.84555
     -0.75845     -0.57266
>> rng(1)
>> reshape(mvnrnd(0,1,6),[2,3])'
ans =
     -0.64901       1.1812
     -0.75845      -1.1096
     -0.84555     -0.57266

All well and good, but what did you do now? You changed sgm. This gets into how you generate a set of normal deviates with a non-identity covariance matrix. So how is that done? You generate a set of iid N(0,1) deviates, then you transform them with a matrix multiply. So effectively, mvnrnd(mu,sgm,N) calls randn, generating an array of the right size. Then it multiplies that Nx2 matrix with a 2x2 matrix derived from the supplied covariance matrix. (The transformation is derived from a Cholesky factorization of your covariance matrix. I can go into more depth there, but I think it may only confuse things, rather than add information.)

Now does it make more sense? When you generate them one pair at a time, you generate them in a different sequence.

Related Solutions

MATLAB: How to interpret the outputs of the function “runstest.m”

This is a problem of appreciating the terminology, and the double negative way things get worded. Do two negatives make a positive? Not always. But then, not, not, always either. ;-)

Lets see what happens when we throw a very non-random sequence at runstest.

runstest(cumsum(rand(1,100)))
ans =
     1
runstest(mod(1:100,2))
ans =
     1

The null hypothesis is that the sequence posed is truly random. Can we KNOW a sequence was a random one? Well, you can't KNOW that as fact. You can only test how random it seems to be, or, how non-random. So both cases above are clearly seen to be unlikely events, IF the sequence were truly a random one. Essentially, we reject the null hypothesis, so the a priori assumption that the sequences really are random.

But, when I do this:

runstest(rand(1,1000))
ans =
     0

runstest is not willing to say the sequence seems to be non-random. So, we cannot reject the null hypothesis. Does that mean the sequence is random? NO!!!!!! Only that we cannot comfortably decide the sequence is probably non-random.

So, now lets look at p.

[h,p] = runstest(cumsum(rand(1,100)))
h =
     1
p =
      4.04395443273876e-29

Yep. It is non-random. I'll bet some decent money on that. p is TINY, it reflects a measure of how likely we think this event might be, in context of a truly random sequence.

Now, lets try a sequence of sequences.

x = rand(1,100);
[h,p] = runstest(x)
h =
     0
p =
         0.838684318956303
[h,p] = runstest([x,x])
h =
     0
p =
         0.720356034109542
[h,p] = runstest([x,x,x])
h =
     0
p =
          0.64038079177353
[h,p] = runstest([x,x,x,x])
h =
     0
p =
         0.578177151383335

See that the way I've constructed those sequences, the latter ones are seen to be somewhat less likely to be random in context of the runs seen, although runstest is not designed to detect the totally non-random way I extended the sequences. It cannot detect long lags as I extended the sequence, which in fact, make the longer versions non-random.

If I try a sequence of length 400 directly from rand, p is now larger.

[h,p] = runstest(rand(1,400))
h =
     0
p =
          0.72633274852919

So h==1 says runstest has detected an event that did not seem consistent with randomness. h==0 says it did not detect significant non-randomness, but it cannot tell you to conclude a sequence is truly random, or not. p tells you the strength of what it sees.

MATLAB: How to get the position of the highest numbers of a vector

>> [mx,ix]=sort(a,'descend');
>> N=4;
>> num2str([ix(1:N) mx(1:N) ],'Posn %2d Value %7.4f')
ans =
Posn  2 Value  1.1812
Posn 10 Value  0.5864
Posn  8 Value  0.1784
Posn  9 Value -0.1969
>>

Best Answer

Related Solutions

MATLAB: How to interpret the outputs of the function “runstest.m”

MATLAB: How to get the position of the highest numbers of a vector

Related Question