I am trying to match the outputs of PROC REG with PROC GENMOD. I ran a sample test on the 'iris' dataset of R.

The data set is as follows (150 rows in total):

```
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

My PROC REG code is:

```
proc reg data=iris;
model Sepal_Length = Sepal_Width Petal_Length Petal_Width;
run;
```

My PROC GENMOD code is:

```
proc genmod data=iris;
model Sepal_Length = Sepal_Width Petal_Length Petal_Width / dist=normal;
run;
```

The output for PROC REG is:

```
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 1.85600 0.25078 7.40 <.0001
Sepal_Width 1 0.65084 0.06665 9.77 <.0001
Petal_Length 1 0.70913 0.05672 12.50 <.0001
Petal_Width 1 -0.55648 0.12755 -4.36 <.0001
```

The output for PROC GENMOD is:

```
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 1.8560 0.2474 1.3711 2.3409 56.28 <.0001
Sepal_Width 1 0.6508 0.0658 0.5220 0.7797 97.98 <.0001
Petal_Length 1 0.7091 0.0560 0.5995 0.8188 160.59 <.0001
Petal_Width 1 -0.5565 0.1258 -0.8031 -0.3098 19.56 <.0001
Scale 1 0.3103 0.0179 0.2771 0.3475
```

According to my understanding, the standard errors of both codes should match as the generalized linear model is run on a normal distribution.

Also, I ran both regression on R using `lm()`

and `glm(..., family=gaussian)`

and the standard error came out equal. Moreover, they are the same as the standard error of PROC REG.

Can anyone elaborate on why they are not matching?

## Best Answer

Whenever you see small inconsistent discrepancies among standard errors or other quantities directly related to variances, suspect bias corrections.In this case, we are given ample clues. First, consider the ratios of the standard errors:

$$\eqalign{ &(.25078, .06665, .05672, .12755) / (.2474, .0658, .056, .1258)\\& = 1.0137, 1.0129, 1.0129, 1.0139).}$$

Next, consider that the regression itself involves $150$ observations and $4$ variables. A bias-corrected estimate of a variance would therefore involve a ratio of $150$ to $150-4$. Let's see what this correction might do the squares of the standard errors. Multiplying the previous results by $146/150$ gives

$$(1.0001105, 0.9986427, 0.9985228, 1.0006017)$$

Their mean is $0.9995 \pm .0005$, which is essentially $1$. Provisionally, then, it is fair to conclude that

both procedures are doing the same thing but using different estimates for the variances of the parameters.Most likely one of them is using Maximum Likelihood estimates (which involve no bias corrections) and the other is using ordinary least squares formulas (which usually include bias corrections). Given that`GENMOD`

is generalized linear model code, and GLMs are (almost) always fit using ML, and given that`REG`

is least squares regression, this conclusion seems well supported.We should still be a little puzzled by the variation in residual ratios, even though it's small: these ratios differ from $1$ by about $0.1$%. Without anything to go on but experience, I would provisionally attribute these variations to numerical errors associated with incomplete convergence in the likelihood optimization procedure of GENMOD. We're getting close to the default value of the parameter convergence tolerance (

`XCONV`

) of $10^{-4}$.