From what you've told me, I would almost certainly predict that if you: (1) ran standard two stage least squares estimator and (2) tried to compute robust standard errors, then some of the standard errors for your coefficients would either be reported as zero or correctly reported as blank/uncomputable.
My guess at what is happening is that some of your coefficients are exactly identified, hence a heteroskedastic robust covariance matrix cannot be computed for those coefficients. Go to invert that to compute your optimal weighting matrix and Matlab will tell you that the matrix singular or near singular.
Some background
From what you're telling me, Z
matrix is full column rank. What you're also telling me is that Z .* U
is less than full column rank (where U = u * ones(1, L)
and u
is your residuals). That is:
$$ \underbrace{Z = \left[\begin{array}{ccc} z_{1,1} & z_{1,2} & z_{z,3} \\ z_{2,1} & z_{2,2} & z_{2,3} \\ \ldots & \ldots & \ldots \end{array} \right]}_{\text{full column rank}} \quad \quad \underbrace{A = \left[\begin{array}{ccc} z_{1,1}u_1 & z_{1,2}u_1 & z_{z,3}u_1 \\ z_{2,1}u_2 & z_{2,2}u_2 & z_{2,3}u_2 \\ \ldots & \ldots & \ldots \end{array} \right]}_{\text{rank deficient?}}$$
The square of the singular values of the matrix A on the right are singular values of $Z'\Omega Z$, that is, $Z' \Omega Z$ has zeros (or near zeros) for singular values if and only if A has zeros or near zeros. (Basically: effective rank = # of singular values above some threshold )
Something I would do is singular value decompositions on both $Z$ and $A$ (or Z'*Z and A'*A, basically same thing for these purposes). Singular values near zero are indicative of a matrix, that from a numerical linear algebra, is near rank deficient. Does $Z$ already have problems without Matlab spitting out an error? Or do the singular values only go to zero (or near zero) once multiplying by the residuals $u_i$?
An educated guess at what might be happening...
Imagine you have 3 observations:
$$ Z = \left[ \begin{array}{cc} 1 & 0 \\ 1 & 1 \\ 1 & 1 \end{array} \right]$$
Z is full column rank because of observation 1. But if the residual $u_1$ for observation 1 is zero...
$$ A = \left[\begin{array}{cc} z_{11}u_1 & z_{12}u_1 \\ z_{21} u_2 & z_{22} u_2 \\ z_{31}u_3 &z_{32}u_3 \end{array}\right] = \left[\begin{array}{cc} 0 & 0 \\ u_2 & u_2 \\ u_3 & u_3 \end{array}\right]$$
Then $A$ is column rank 1 and $Z'\Omega Z = A'A$ will be rank deficient.
Intuitively what would be happening is that if some of your initial b
estimates are exactly identified the estimated residual is zero and you can't form an estimate of the variance or covariance! There are 0 leftover degrees of freedom. Mechanically, what gets pumped out is an estimate of zero, then you go to invert that, and Matlab blows up by telling you the matrix is singular or close to singular.
For reference, some MATLAB code I've written/tested to do what I imagine you're trying to do:
Warning, this code does NOT check for rank deficiency. Let endo
be your endogenous regressors, exo
be your exogenous regressors (constant ones(N, 1)
goes here), and instruments
be your instruments. Initial setup:
X = [endo, exo]; % N x (L + K)
Z = [exo, instruments]; % N x (K + L2)
Calculate your basic ingredients so you don't need to repeat these expensive operations:
ZprimX = Z' * X;
Zprimy = Z' * y;
ZprimZ = Z' * Z;
Get your initial estimates (you will use for weighting matrix):
%%% Method 1 (intuitive):
% stage1_estimate = Z \ endo;
% endo_hat = Z * stage1_estimate;
% b_2sls = [endo_hat, exo] \ y;
%%% Method 2: (slick)
b_2sls = (Z * (ZprimZ \ ZprimX)) \ y;
Get residuals:
u_2sls = y - X * b_2sls;
Calculate weighting matrix inverse
% Basic version
% W_inv = (u_2sls' * u_2sls / N) * ZprimZ / N;
% Robust version
temp = (u_2sls * ones(1, K + L2)) .* Z;
W_inv = temp' * temp / N;
Calculate your estimate b:
%%% Method 1 (intuitive)
% W = inv(W_inv);
% A = ZprimX' * W * ZprimX;
% c = ZprimX' * W * Zprimy; % = (Zprimy' * W * ZprimX)'
% results.b = inv(A) * c;
%%% METHOD 2 (slick):
% This is a way without matrix inversion to do it.
% Solve linear systems rather than invert W_inv
temp = W_inv \ ZprimX; % Solve inv(W_inv) * ZprimX = W * ZprimX
A = ZprimX' * temp;
c = (Zprimy' * temp)';
results.b = A \ c;
results.b_endo = results.b(1:L);
results.b_exo = results.b(L+1:end);
Best Answer
When you calculate the weight matrix, you need to use the value of the moment conditions over the observations. Since the optimal weight matrix is the variance-co-variance matrix of your moments, you need to find that. Right now, you're just taking the inner product of the average value of your moments. Try the following:
Again the difference is in where you put the expectation. The optimal weight matrix is:
$$W = E\left[ g(X_i,\theta)g(X_i,\theta)'\right]^{-1}$$
As opposed to: $$\left(E[ g(X_i,\theta)]'E[g(X_i,\theta)]\right)^{-1}$$ which is what the current code gives you.