Method of Moments – Principle of Analogy and Method of Moments in Econometrics

econometricsgeneralized-momentsmethod of moments

I am studying method of moments and GMM in the context of econometrics.

Can someone explain on intuitive level, what does it mean to match moments? And how does this differ from the classical linear regression model, for example?

Some terms and commonly used clauses I would appreciate gaining clarity are:

  1. Orthogonallity conditions state that a set of population moments are zero. Method of moments is exactly exploiting this condition.
  2. The basic principle of MoM is to choose the parameter estimate so that the corresponding sample moments are also zero. This is the "matching" part of the population moments with those of sample.
  3. The impetus for the evolution of MoM is that Hansen thought if there are more orthogonality conditions than parameters, then the system may not have a solution. The extension of MoM developed by Hansen is precisely to cover this case with GMM. (My Question: Why does it become a problem or the system may not have a question if the orthogonal conditions number is larger than that of parameters? Can someone explain this both in linear algebra language and in the context with example, like log wage example used in Hayashi)!

Best Answer

Least squares estimator in the classical linear regression model is a Method of Moments estimator.

The model is

$$\mathbf y = \mathbf X\beta + \mathbf u$$

Instead of minimizing the sum of squared residuals, we can obtain the OLS estimator by noting that under the assumptions of the specific model, it holds that ("orhtogonality condition")

$$E(\mathbf X' \mathbf u)= \mathbf 0$$

$$\implies E(\mathbf X'( \mathbf y - \mathbf X\beta))=\mathbf 0 \implies E(\mathbf X'\mathbf y)=E(\mathbf X'\mathbf X)\beta$$

$$\implies \beta = \left[E(\mathbf X'\mathbf X)\right]^{-1}E(\mathbf X'\mathbf y)$$

So if we knew the true expected values (and our assumptions were correct), we could calculate the true value of the unknown coefficient.

With non-experimental data, we don't. But we know that if our sample is ergodic-stationary (and i.i.d. samples are), then expected values are consistently estimated by their sample analogues, the corresponding sample means. Hence we have an "acceptable" estimator in

$$\hat \beta = \left[((1/n)\mathbf X'\mathbf X)\right]^{-1}((1/n)\mathbf X'\mathbf y) = (\mathbf X'\mathbf X)^{-1}\mathbf X'\mathbf y $$

which is the same estimator we will obtain if we minimize the sum of squared residuals.

If you reverse the calculations, and noting that the residuals are a function of $\hat \beta$, $\mathbf {\hat u} = \mathbf {\hat u(\hat \beta)}$ you will find that $\mathbf X' \mathbf {\hat u} (\hat \beta) = \mathbf 0$. Divide by $n$ for this to look like a sample mean.

So "we choose those estimates that make the sample obey what we assumed the population obeys". And we do that because we accept that the sample is representative of the population, so it should "behave like" the population (as we assumed the latter to behave).

As for $GMM$, each orthogonality condition is an equation. If you have $m$ equations and $k<m$ unknown coefficients, then the system of equations is "over-identified" and no exact solution exists, there is nothing more to it.

Hansen contrasted this with the original use of "Method of Moments": if a distribution is characterized by, say, three unknown parameters, the MoM tactic is to estimate using the sample the first three moments of the distribution (which appear in equations involving these parameters), and obtain an exactly-identified system of equations. See how this works in this answer, as an example.