Solved – SAS – REG vs GENMOD; OLS vs MLE

least squaresmaximum likelihood

I'm using a very simple data set from an article in trying to further my understanding of GLMs. I've input the data using SAS, and I've run both the PROC REG and PROC GENMOD procedures on the data. In the PROC GENMOD procedure, I used a log link with a normal distribution; in the PROC REG procedure, I used the log of the response variable in the model.

My question is, why don't the parameter estimates of the two procedures match? My understanding is that PROC REG uses OLS to estimate the parameters, whereas PROC GENMOD uses MLE with a Newton-Raphson iterative process for estimation. But I had thought that, when the assumed distribution is normal and the relationship is linear (which, after the log transformation, it is in the GLM, right?), MLE is equal to OLS.

Here are the resulting parameters from the run:

     REG     GENMOD
A1   4.623    4.579 
A2   4.688    4.730 
A3   4.654    4.654 
B1   (0.735)    (0.741)
B2   (0.487)    (0.436)

And here is my code:

data GLM;
 input Y A1 A2 A3 B1 B2;
 lnY = LOG(Y);
 datalines;
95 1 0 0 0 0
115 0 1 0 0 0
105 0 0 1 0 0
55 1 0 0 1 0
45 0 1 0 1 0
30 1 0 0 1 1
; 

proc genmod data=GLM;
 model Y = A1 A2 A3 B1 B2 / dist=normal link=log scale=deviance noint ;
 weight Y;
run;

proc reg data=GLM;
 model lnY = A1 A2 A3 B1 B2 / noint;
 weight Y;
run;

Any insight that anyone can contribute is greatly appreciated!

Bonus question – in my data I have 6 equations and 5 variables. Why is an iterative process needed to solve that?

Best Answer

"Weight" functions differently in the two PROCS:

In PROC REG "weight" fits

The normal equations used when a WEIGHT statement is present are

${X’WX}\beta = X’WY$

where $W$ is a diagonal matrix consisting of the values of the variable specified in the WEIGHT statement.

In GENMOD

The WEIGHT statement identifies a variable in the input data set to be used as the exponential family dispersion parameter weight for each observation. The exponential family dispersion parameter is divided by the WEIGHT variable value for each observation. This is true regardless of whether the parameter is estimated by the procedure or specified in the MODEL statement with the SCALE= option. It is also true for distributions such as the Poisson and binomial that are not usually defined to have a dispersion parameter. For these distributions, a WEIGHT variable weights the overdispersion parameter, which has the default value of 1.

The WEIGHT variable does not have to be an integer; if it is less than or equal to 0 or if it is missing, the corresponding observation is not used.

Related Question