R VS SAS – Why Does SAS PROC GLIMMIX Give Different Random Slopes Than glmer?

binomial distributionlme4-nlmerrandom-effects-modelsas

I am a user more familiar with R, and have been trying to estimate random slopes (selection coefficients) for about 35 individuals over 5 years for four habitat variables. The response variable is whether a location was "used" (1) or "available" (0) habitat ("use" below).

I am using a Windows 64-bit computer.

In R version 3.1.0, I use the data and expression below. PS, TH, RS, and HW are fixed effects (standardized, measured distance to habitat types). lme4 V 1.1-7.

str(dat)
'data.frame':   359756 obs. of  7 variables:
 $ use     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Year    : Factor w/ 5 levels "1","2","3","4",..: 4 4 4 4 4 4 4 4 3 4 ...
 $ ID      : num  306 306 306 306 306 306 306 306 162 306 ...
 $ PS: num  -0.32 -0.317 -0.317 -0.318 -0.317 ...
 $ TH: num  -0.211 -0.211 -0.211 -0.213 -0.22 ...
 $ RS: num  -0.337 -0.337 -0.337 -0.337 -0.337 ...
 $ HW: num  -0.0258 -0.19 -0.19 -0.19 -0.4561 ...

glmer(use ~  PS + TH + RS + HW +
     (1 + PS + TH + RS + HW |ID/Year),
     family = binomial, data = dat, control=glmerControl(optimizer="bobyqa"))

glmer gives me parameter estimates for the fixed effects that make sense to me, and the random slopes (which I interpret as selection coefficients to each habitat type) also make sense when I investigate the data qualitatively. The log-likelihood for the model is -3050.8.

However, most research in animal ecology do not use R because with animal location data, spatial autocorrelation can make the standard errors prone to type I error. While R uses model-based standard errors, empirical (also Huber-white or sandwich) standard errors are preferred.

While R does not currently offer this option (to my knowledge – PLEASE, correct me if I am wrong), SAS does – although I do not have access to SAS, a colleague agreed to let me borrow his computer to determine if the standard errors change significantly when the empirical method is used.

First, we wished to ensure that when using model-based standard errors, SAS would produce similar estimates to R – to be certain that the model is specified the same way in both programs. I don't care if they are exactly the same – just similar.
I tried (SAS V 9.2):

proc glimmix data=dat method=laplace;
   class year id;
   model use =  PS TH RS HW / dist=bin solution ddfm=betwithin;
   random intercept PS TH RS HW / subject = year(id) solution type=UN;
run;title;

I also tried various other forms, such as adding lines

random intercept / subject = year(id) solution type=UN;
random intercept PS TH RS HW / subject = id solution type=UN;

I tried without specifying the

solution type = UN,

or commenting out

ddfm=betwithin;

No matter how we specify the model (and we have tried many ways), I cannot get the random slopes in SAS to remotely resemble those output from R – even though the fixed effects are similar enough. And when I mean different, I mean that not even the signs are the same. The -2 Log Likelihood in SAS was 71344.94.

I can't upload my full dataset; so I made a toy dataset with only the records from three individuals. SAS gives me output in a few minutes; in R it takes over an hour. Weird. With this toy dataset I'm now getting different estimates for the fixed effects.

My question:
Can anyone shed light on why the random slopes estimates might be so different between R and SAS? Is there anything I can do in R, or SAS, to modify my code so that the calls produce similar results? I'd rather change the code in SAS, since I "believe" my R estimates more.

I'm really concerned with these differences and want to get to the bottom of this problem!

My output from a toy dataset that uses only three of the 35 individuals in the full dataset for R and SAS are included as jpegs.

R output
SAS output 1
SAS output 2
SAS output 3

EDIT AND UPDATE:

As @JakeWestfall helped discover, the slopes in SAS do not include the fixed effects. When I add the fixed effects, here is the result – comparing R slopes to SAS slopes for one fixed effect, "PS", between programs: (Selection coefficient = random slope). Note the increased variation in SAS.

R vs SAS for PS

Best Answer

It appears that I shouldn't have expected the random slopes to be similar between packages, according to Zhang et al 2011. In their paper On Fitting Generalized Linear Mixed-effects Models for Binary Responses using Different Statistical Packages, they describe:

Abstract:

The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice.

I hope @BenBolker and team will consider my question as a vote for R to incorporate empirical standard errors and Gauss-Hermite Quadrature capability for models with several random slope terms to glmer, as I prefer the R interface and would love to be able to apply some further analyses in that program. Happily, even while R and SAS do not have comparable values for random slopes, the overall trends are the same. Thanks all for your input, I really appreciate the time and consideration that you put into this!

Related Solutions

Solved – Why proc mixed give different DF than proc glimmix

In the mixed procedure code you specified a random statement and a repeated statement. The first one affects the G-matrix (and therefore the DFs), the second only affects the R-matrix. Your glimmix procedure code only contains a random statement with a residual option which therefore only affects the R-matrix and acts as the repeated statement in the mixed procedure. I think for proc glimmix you need a second random statement identical to the random statement of the mixed procedure.

In reply to Mike's comment I try to clarify the notation: The normal mixed model (proc mixed) can be written as: $ y=X \alpha + Z \beta + e$, where $\alpha$ represents the fixed effects and $\beta$ the random effects with $\beta \sim N(0, G)$ and $var(e)=R$, so that $var(y)=V=ZGZ' +R$.

In the generalized linear mixed model (proc glimmix) this is analogous: $y=\mu+e$, where $g(\mu)=X \alpha + Z \beta$, $var(e)=R$ and therefore $var(y)=V=var(\mu)+R \approx BZGZ'B +R$, $B$ being a diagonal matrix of variance terms.

Mixed Models – Equivalent Mixed Models Yielding Different Results in SAS

By default, SAS uses a very simple method to compute the degrees of freedom; it looks to see if the fixed effect appears in a term in the random effects; if so, it uses that random effect as the degrees of freedom.

In this case, the model with random effects of TUBE and TUBE*POSITION do the right thing as this method can tell that POSITION should use TUBE*POSITION. But the model with POSITION / subject=TUBE cannot. Instead, one should tell it to compute the degrees of freedom using another method, usually the Satterthwaite (satterth) for models with only random effects and the Kenward-Roger method (kr) for models with repeated effects structures.

PROC MIXED DATA=dat ;
CLASS POSITION TUBE ;
MODEL y = POSITION /s ddfm=satterth;
RANDOM POSITION / subject=TUBE type=CS;
RUN;

Best Answer

Related Solutions

Solved – Why proc mixed give different DF than proc glimmix

Mixed Models – Equivalent Mixed Models Yielding Different Results in SAS

Related Question