Solved – Standard errors of estimates vary in PROC REG and PROC GENMOD!

generalized linear modelrregressionsasstandard error

I am trying to match the outputs of PROC REG with PROC GENMOD. I ran a sample test on the 'iris' dataset of R.

The data set is as follows (150 rows in total):

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

My PROC REG code is:

proc reg data=iris;
    model Sepal_Length = Sepal_Width Petal_Length Petal_Width;
run;

My PROC GENMOD code is:

proc genmod data=iris;
    model Sepal_Length = Sepal_Width Petal_Length Petal_Width / dist=normal;
run;

The output for PROC REG is:

                        Parameter       Standard
Variable        DF       Estimate          Error    t Value    Pr > |t|

Intercept        1        1.85600        0.25078       7.40      <.0001
Sepal_Width      1        0.65084        0.06665       9.77      <.0001
Petal_Length     1        0.70913        0.05672      12.50      <.0001
Petal_Width      1       -0.55648        0.12755      -4.36      <.0001

The output for PROC GENMOD is:

                               Standard     Wald 95% Confidence      Chi-
Parameter      DF   Estimate      Error           Limits           Square   Pr > ChiSq

Intercept       1     1.8560     0.2474     1.3711       2.3409     56.28       <.0001
Sepal_Width     1     0.6508     0.0658     0.5220       0.7797     97.98       <.0001
Petal_Length    1     0.7091     0.0560     0.5995       0.8188    160.59       <.0001
Petal_Width     1    -0.5565     0.1258    -0.8031      -0.3098     19.56       <.0001
Scale           1     0.3103     0.0179     0.2771       0.3475

According to my understanding, the standard errors of both codes should match as the generalized linear model is run on a normal distribution.

Also, I ran both regression on R using lm() and glm(..., family=gaussian) and the standard error came out equal. Moreover, they are the same as the standard error of PROC REG.

Can anyone elaborate on why they are not matching?

Best Answer

Whenever you see small inconsistent discrepancies among standard errors or other quantities directly related to variances, suspect bias corrections.


In this case, we are given ample clues. First, consider the ratios of the standard errors:

$$\eqalign{ &(.25078, .06665, .05672, .12755) / (.2474, .0658, .056, .1258)\\& = 1.0137, 1.0129, 1.0129, 1.0139).}$$

Next, consider that the regression itself involves $150$ observations and $4$ variables. A bias-corrected estimate of a variance would therefore involve a ratio of $150$ to $150-4$. Let's see what this correction might do the squares of the standard errors. Multiplying the previous results by $146/150$ gives

$$(1.0001105, 0.9986427, 0.9985228, 1.0006017)$$

Their mean is $0.9995 \pm .0005$, which is essentially $1$. Provisionally, then, it is fair to conclude that both procedures are doing the same thing but using different estimates for the variances of the parameters. Most likely one of them is using Maximum Likelihood estimates (which involve no bias corrections) and the other is using ordinary least squares formulas (which usually include bias corrections). Given that GENMOD is generalized linear model code, and GLMs are (almost) always fit using ML, and given that REG is least squares regression, this conclusion seems well supported.


We should still be a little puzzled by the variation in residual ratios, even though it's small: these ratios differ from $1$ by about $0.1$%. Without anything to go on but experience, I would provisionally attribute these variations to numerical errors associated with incomplete convergence in the likelihood optimization procedure of GENMOD. We're getting close to the default value of the parameter convergence tolerance (XCONV) of $10^{-4}$.

Related Question