Solved – Convert SAS NLMIXED code for zero-inflated gamma regression to R

gamlssrsas

I'm trying to run a zero-inflated regression for a continuous response variable in R. I'm aware of a gamlss implementation, but I'd really like to try out this algorithm by Dale McLerran that is conceptually a bit more straightforward. Unfortunately, the code is in SAS and I'm not sure how to re-write it for something like nlme.

The code is as follows:

proc nlmixed data=mydata;
  parms b0_f=0 b1_f=0 
        b0_h=0 b1_h=0 
        log_theta=0;


  eta_f = b0_f + b1_f*x1 ;
  p_yEQ0 = 1 / (1 + exp(-eta_f));


  eta_h = b0_h + b1_h*x1;
  mu    = exp(eta_h);
  theta = exp(log_theta);
  r = mu/theta;


  if y=0 then
     ll = log(p_yEQ0);
  else
     ll = log(1 - p_yEQ0)
          - lgamma(theta) + (theta-1)*log(y) - theta*log(r) - y/r;


  model y ~ general(ll);
  predict (1 - p_yEQ0)*mu out=expect_zig;
  predict r out=shape;
  estimate "scale" theta;
run;

From: http://listserv.uga.edu/cgi-bin/wa?A2=ind0805A&L=sas-l&P=R20779

ADD:

Note: There are no mixed effects present here – only fixed.

The advantage to this fitting is that (even though the coefficients are the same as if you separately fit a logistic regression to P(y=0) and a gamma error regression with log link to E(y | y>0)) you can estimate the combined function E(y) which includes the zeroes. One can predict this value in SAS (with a CI) using the line predict (1 - p_yEQ0)*mu .

Further, one is able to write custom contrast statements to test the significance of predictor variables on E(y). For example, here is another version of the SAS code I have used:

proc nlmixed data=TestZIG;
      parms b0_f=0 b1_f=0 b2_f=0 b3_f=0
            b0_h=0 b1_h=0 b2_h=0 b3_h=0
            log_theta=0;


        if gifts = 1 then x1=1; else x1 =0;
        if gifts = 2 then x2=1; else x2 =0;
        if gifts = 3 then x3=1; else x3 =0;


      eta_f = b0_f + b1_f*x1 + b2_f*x2 + b3_f*x3;
      p_yEQ0 = 1 / (1 + exp(-eta_f));

      eta_h = b0_h + b1_h*x1 + b2_h*x2 + b3_h*x3;
      mu    = exp(eta_h);
      theta = exp(log_theta);
      r = mu/theta;

      if amount=0 then
         ll = log(p_yEQ0);
      else
         ll = log(1 - p_yEQ0)
              - lgamma(theta) + (theta-1)*log(amount) -                      theta*log(r) - amount/r;

      model amount ~ general(ll);
      predict (1 - p_yEQ0)*mu out=expect_zig;
      estimate "scale" theta;
    run; 

Then to estimate "gift1" versus "gift2" (b1 versus b2) we can write this estimate statement:

estimate "gift1 versus gift 2" 
 (1-(1 / (1 + exp(-b0_f -b1_f))))*(exp(b0_h + b1_h)) - (1-(1 / (1 + exp(-b0_f -b2_f))))*(exp(b0_h + b2_h)) ; 

Can R do this?

Best Answer

Having spent some time on this code, it appears to me as though it basically:

1) Does a logistic regression with right hand side b0_f + b1_f*x1 andy > 0 as a target variable,

2) For those observations for which y > 0, performs a regression with right hand side b0_h + b1_h*x1, a Gamma likelihood and link=log,

3) Also estimates the shape parameter of the Gamma distribution.

It maximizes the likelihood jointly, which is nice, because you only have to make the one function call. However, the likelihood separates anyway, so you don't get improved parameter estimates as a result.

Here is some R code which makes use of the glm function to save programming effort. This may not be what you'd like, as it obscures the algorithm itself. The code certainly isn't as clean as it could / should be, either.

McLerran <- function(y, x)
{
  z <- y > 0
  y.gt.0 <- y[y>0]
  x.gt.0 <- x[y>0]

  m1 <- glm(z~x, family=binomial)
  m2 <- glm(y.gt.0~x.gt.0, family=Gamma(link=log))

  list("p.ygt0"=m1,"ygt0"=m2)
}

# Sample data
x <- runif(100)
y <- rgamma(100, 3, 1)      # Not a function of x (coef. of x = 0)
b <- rbinom(100, 1, 0.5*x)  # p(y==0) is a function of x
y[b==1] <- 0

foo <- McLerran(y,x)
summary(foo$ygt0)

Call:
glm(formula = y.gt.0 ~ x.gt.0, family = Gamma(link = log))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.08888  -0.44446  -0.06589   0.28111   1.31066  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.2033     0.1377   8.737 1.44e-12 ***
x.gt.0       -0.2440     0.2352  -1.037    0.303    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for Gamma family taken to be 0.3448334)

    Null deviance: 26.675  on 66  degrees of freedom
Residual deviance: 26.280  on 65  degrees of freedom
AIC: 256.42

Number of Fisher Scoring iterations: 6

The shape parameter for the Gamma distribution is equal to 1 / the dispersion parameter for the Gamma family. Coefficients and other stuff which you might like to access programatically can be accessed on the individual elements of the return value list:

> coefficients(foo$p.ygt0)
(Intercept)           x 
   2.140239   -2.393388 

Prediction can be done using the output of the routine. Here's some more R code that shows how to generate expected values and some other information:

# Predict expected value
predict.McLerren <- function(model, x.new)
{
  x <- as.data.frame(x.new)
  colnames(x) <- "x"
  x$x.gt.0 <- x$x

  pred.p.ygt0 <- predict(model$p.ygt0, newdata=x, type="response", se.fit=TRUE)
  pred.ygt0 <- predict(model$ygt0, newdata=x, type="response", se.fit=TRUE)  

  p0 <- 1 - pred.p.ygt0$fit
  ev <- (1-p0) * pred.ygt0$fit

  se.p0 <- pred.p.ygt0$se.fit
  se.ev <- pred.ygt0$se.fit

  se.fit <- sqrt(((1-p0)*se.ev)^2 + (ev*se.p0)^2 + (se.p0*se.ev)^2)

  list("fit"=ev, "p0"=p0, "se.fit" = se.fit,
       "pred.p.ygt0"=pred.p.ygt0, "pred.ygt0"=pred.ygt0)
}

And a sample run:

> x.new <- seq(0.05,0.95,length=5)
> 
> foo.pred <- predict.McLerren(foo, x.new)
> foo.pred$fit
       1        2        3        4        5 
2.408946 2.333231 2.201889 2.009979 1.763201 
> foo.pred$se.fit
        1         2         3         4         5 
0.3409576 0.2378386 0.1753987 0.2022401 0.2785045 
> foo.pred$p0
        1         2         3         4         5 
0.1205351 0.1733806 0.2429933 0.3294175 0.4291541 

Now for coefficient extraction and the contrasts:

coef.McLerren <- function(model)
{
  temp1 <- coefficients(model$p.ygt0)
  temp2 <- coefficients(model$ygt0)
  names(temp1) <- NULL
  names(temp2) <- NULL
  retval <- c(temp1, temp2)
  names(retval) <- c("b0.f","b1.f","b0.h","b1.h")
  retval
}

contrast.McLerren <- function(b0_f, b1_f, b2_f, b0_h, b1_h, b2_h)
{
  (1-(1 / (1 + exp(-b0_f -b1_f))))*(exp(b0_h + b1_h)) - (1-(1 / (1 + exp(-b0_f -b2_f))))*(exp(b0_h + b2_h))
}


> coef.McLerren(foo)
      b0.f       b1.f       b0.h       b1.h 
 2.0819321 -1.8911883  1.0009568  0.1334845 
Related Question