I think I know what's happening. Apparently, diag() doesn't return it's full result until this result is later used for actual computation. The output is stored in some sparse form until that happens. So, the tic/toc for the z1 calculation accounts for both the computation time and the memory allocation of the full matrix as well. The remarkable similarity between the total times for creation of F_i + calculation of z_i supports this:
N=1e4+1;
tic
F1=diag(ones(N,1),0);
toc
tic
F2=diag(ones(N,1),0)+0;
toc
x=rand(1e4,1); x(N)=0;
tic;
z1=x.'*(F1*x)+7;
toc;
tic;
z2=x.'*(F2*x)+7;
toc;
Best Answer