Method of Moments – Principle of Analogy and Method of Moments in Econometrics

econometricsgeneralized-momentsmethod of moments

I am studying method of moments and GMM in the context of econometrics.

Can someone explain on intuitive level, what does it mean to match moments? And how does this differ from the classical linear regression model, for example?

Some terms and commonly used clauses I would appreciate gaining clarity are:

Orthogonallity conditions state that a set of population moments are zero. Method of moments is exactly exploiting this condition.
The basic principle of MoM is to choose the parameter estimate so that the corresponding sample moments are also zero. This is the "matching" part of the population moments with those of sample.
The impetus for the evolution of MoM is that Hansen thought if there are more orthogonality conditions than parameters, then the system may not have a solution. The extension of MoM developed by Hansen is precisely to cover this case with GMM. (My Question: Why does it become a problem or the system may not have a question if the orthogonal conditions number is larger than that of parameters? Can someone explain this both in linear algebra language and in the context with example, like log wage example used in Hayashi)!

Best Answer

Least squares estimator in the classical linear regression model is a Method of Moments estimator.

The model is

$$\mathbf y = \mathbf X\beta + \mathbf u$$

Instead of minimizing the sum of squared residuals, we can obtain the OLS estimator by noting that under the assumptions of the specific model, it holds that ("orhtogonality condition")

$$E(\mathbf X' \mathbf u)= \mathbf 0$$

$$\implies E(\mathbf X'( \mathbf y - \mathbf X\beta))=\mathbf 0 \implies E(\mathbf X'\mathbf y)=E(\mathbf X'\mathbf X)\beta$$

$$\implies \beta = \left[E(\mathbf X'\mathbf X)\right]^{-1}E(\mathbf X'\mathbf y)$$

So if we knew the true expected values (and our assumptions were correct), we could calculate the true value of the unknown coefficient.

With non-experimental data, we don't. But we know that if our sample is ergodic-stationary (and i.i.d. samples are), then expected values are consistently estimated by their sample analogues, the corresponding sample means. Hence we have an "acceptable" estimator in

$$\hat \beta = \left[((1/n)\mathbf X'\mathbf X)\right]^{-1}((1/n)\mathbf X'\mathbf y) = (\mathbf X'\mathbf X)^{-1}\mathbf X'\mathbf y $$

which is the same estimator we will obtain if we minimize the sum of squared residuals.

If you reverse the calculations, and noting that the residuals are a function of $\hat \beta$, $\mathbf {\hat u} = \mathbf {\hat u(\hat \beta)}$ you will find that $\mathbf X' \mathbf {\hat u} (\hat \beta) = \mathbf 0$. Divide by $n$ for this to look like a sample mean.

So "we choose those estimates that make the sample obey what we assumed the population obeys". And we do that because we accept that the sample is representative of the population, so it should "behave like" the population (as we assumed the latter to behave).

As for $GMM$, each orthogonality condition is an equation. If you have $m$ equations and $k<m$ unknown coefficients, then the system of equations is "over-identified" and no exact solution exists, there is nothing more to it.

Hansen contrasted this with the original use of "Method of Moments": if a distribution is characterized by, say, three unknown parameters, the MoM tactic is to estimate using the sample the first three moments of the distribution (which appear in equations involving these parameters), and obtain an exactly-identified system of equations. See how this works in this answer, as an example.

Some background

From what you're telling me, Z matrix is full column rank. What you're also telling me is that Z .* U is less than full column rank (where U = u * ones(1, L) and u is your residuals). That is:

$$ \underbrace{Z = \left[\begin{array}{ccc} z_{1,1} & z_{1,2} & z_{z,3} \\ z_{2,1} & z_{2,2} & z_{2,3} \\ \ldots & \ldots & \ldots \end{array} \right]}_{\text{full column rank}} \quad \quad \underbrace{A = \left[\begin{array}{ccc} z_{1,1}u_1 & z_{1,2}u_1 & z_{z,3}u_1 \\ z_{2,1}u_2 & z_{2,2}u_2 & z_{2,3}u_2 \\ \ldots & \ldots & \ldots \end{array} \right]}_{\text{rank deficient?}}$$

The square of the singular values of the matrix A on the right are singular values of $Z'\Omega Z$, that is, $Z' \Omega Z$ has zeros (or near zeros) for singular values if and only if A has zeros or near zeros. (Basically: effective rank = # of singular values above some threshold )

Something I would do is singular value decompositions on both $Z$ and $A$ (or Z'*Z and A'*A, basically same thing for these purposes). Singular values near zero are indicative of a matrix, that from a numerical linear algebra, is near rank deficient. Does $Z$ already have problems without Matlab spitting out an error? Or do the singular values only go to zero (or near zero) once multiplying by the residuals $u_i$?

An educated guess at what might be happening...

Imagine you have 3 observations:

$$ Z = \left[ \begin{array}{cc} 1 & 0 \\ 1 & 1 \\ 1 & 1 \end{array} \right]$$

Z is full column rank because of observation 1. But if the residual $u_1$ for observation 1 is zero...

$$ A = \left[\begin{array}{cc} z_{11}u_1 & z_{12}u_1 \\ z_{21} u_2 & z_{22} u_2 \\ z_{31}u_3 &z_{32}u_3 \end{array}\right] = \left[\begin{array}{cc} 0 & 0 \\ u_2 & u_2 \\ u_3 & u_3 \end{array}\right]$$

Then $A$ is column rank 1 and $Z'\Omega Z = A'A$ will be rank deficient.

Intuitively what would be happening is that if some of your initial b estimates are exactly identified the estimated residual is zero and you can't form an estimate of the variance or covariance! There are 0 leftover degrees of freedom. Mechanically, what gets pumped out is an estimate of zero, then you go to invert that, and Matlab blows up by telling you the matrix is singular or close to singular.

For reference, some MATLAB code I've written/tested to do what I imagine you're trying to do:

Warning, this code does NOT check for rank deficiency. Let endo be your endogenous regressors, exo be your exogenous regressors (constant ones(N, 1) goes here), and instruments be your instruments. Initial setup:

X = [endo, exo];         % N x (L  + K) 
Z = [exo, instruments];  % N x (K + L2)

Calculate your basic ingredients so you don't need to repeat these expensive operations:

ZprimX = Z' * X;
Zprimy = Z' * y;
ZprimZ = Z' * Z;

Get your initial estimates (you will use for weighting matrix):

%%% Method 1 (intuitive): 
% stage1_estimate = Z \ endo;
% endo_hat        = Z * stage1_estimate;   
% b_2sls          = [endo_hat, exo] \ y;
%%% Method 2: (slick)
b_2sls = (Z * (ZprimZ \ ZprimX)) \ y;

Get residuals:

  u_2sls          = y - X * b_2sls;

Calculate weighting matrix inverse

% Basic version
% W_inv = (u_2sls' * u_2sls / N) * ZprimZ / N;

% Robust version
temp = (u_2sls * ones(1, K + L2)) .* Z;
W_inv = temp' * temp / N;

Calculate your estimate b:

%%% Method 1 (intuitive)
% W     = inv(W_inv);
% A     = ZprimX' * W * ZprimX;
% c     = ZprimX' * W * Zprimy;    % = (Zprimy' * W * ZprimX)'
% results.b = inv(A) * c;        

%%% METHOD 2 (slick):
% This is a way without matrix inversion to do it. 
% Solve linear systems rather than invert W_inv
temp = W_inv \ ZprimX;          % Solve inv(W_inv) * ZprimX = W * ZprimX
A    = ZprimX' * temp;
c    = (Zprimy' * temp)';
results.b = A \ c;  

results.b_endo = results.b(1:L);
results.b_exo  = results.b(L+1:end);

Best Answer

Related Solutions

Solved – What’s a good introduction to simulated method of moments and the extended path technique

Solved – Could the covariance matrix of the moment conditions in GMM be ill-conditioned

Some background

An educated guess at what might be happening...

For reference, some MATLAB code I've written/tested to do what I imagine you're trying to do:

Related Question