Solved – rjags mixture model for a combination of normal and gamma distributions

jags

Is it possible in rjags to create a mixture model in which one distribution is normal and includes negative values, while the other distribution has only positive support?

Our data are global satellite observations of plant activity levels. Because of noise in the satellite retrievals, small negative observations in winter are common. Thus there appears to be a normally distributed data cluster near zero. For data collected during the unspecified growing seasons, plant activity levels appear to fit a gamma distribution. We want to use a mixture model to regress activity predictors for the growing season.

We would prefer to use Bayesian methods but would consider a frequentist approach.

What hasn’t worked so far:

There’s an unanswered question about how to do this in Matlab at EM of a Gaussian-Gamma mixture model in matlab. Package bmixture provides for Bayesian mixture models of multiple gammas or multiple normal distributions but not a combination of both distributions. There are packages to define and compute likelihoods of clusters, but we need to be able to take the next step of adding regression to the mixture model.

Some mixture models in rjags are set up so that each observation is modeled against both distributions, then one distribution is selected with a probability draw. That’s described at How to model a mixture of finite components from different parametric families with JAGS?, https://stats.stackexchange.com/questions/64634/modelling-zero-inflated-proportion-data-in-r-using-gamlss and a couple other websites. That approach isn’t working for our data because when rjags tries to fit a negative value of y to a gamma distribution, it stops.

In concept perhaps one could route all negative observations to the normal distribution centered near zero and let rjags distribute the rest. It would involve a little fancy footwork with the distribution results. The attempt in the following code is computationally circular, however. Here both distributions are normal, but only for testing.

{  sink("example.jags")
cat("model { 
  # Likelihood
  for (i in 1:n.obs) {
     y[i] ~ dnorm(distrib[gate.final[i]], tau)
     gate.final[i] = ifelse(y[i] <=0, 1, gate.partial[i])
     gate.partial[i] ~ dbern(chance.in.distrib.one)
  }
  distrib[1] ~ dnorm(mu.lo, tau)
  distrib[2] ~ dnorm(mu.hi, tau)

  # Priors
  chance.in.distrib.one ~ dbin(.5, 1)  
  mu.lo ~ dnorm(-.5, .001)
  mu.hi ~ dnorm(0.5, .001) T (mu.lo, )  
  tau ~ dnorm(.001, .001)
}", fill = TRUE)
sink() }

library(rjags) 
jags.data = list(n.obs = 50, y = c(rnorm(25, 2, 1), rnorm(25, -2, 1)))
jags.model("example.jags", data = jags.data, n.chains = 1, n.adapt = 1000)

error message: Possible directed cycle involving some or all of the following nodes: gate.final[1], gate.final[2], … y[1], y[2],….

Thank you.

Best Answer

It is relatively easy to implement a mixture model where the different distributions have the same parametric family - the dnormmix distribution in JAGS does this using an inbuilt distribution for a mixture of normals (see chapter 12 of the JAGS version 4.3.0 manual which is now available from https://sourceforge.net/projects/mcmc-jags/files/Manuals/4.x/).

When mixing different parametric distributions it becomes much more difficult, although it is still possible - see for example the related question at stack overflow: https://stackoverflow.com/questions/36609365/how-to-model-a-mixture-of-finite-components-from-different-parametric-families-w/36618622#36618622

That answer gives a solution for a mixture of normal and uniform, but that should be easily changed to a mixture of normal and gamma. One thing that you will have to watch is to provide initial values for the norm_chosen parameter, to ensure that the negative values are initialised to the normal distribution (otherwise you will get an error).


Edit

JAGS returns -Inf when calculating the log density associated with negative values from the Gamma distribution, which is as expected (i.e. the log of a probability of 0), so I'm not sure exactly what error you got when you tried it.

In any case it is fairly to avoid even calculating these values using a small additional trick: use nested indexing to avoid calculating the problematic values. See the following code (and specifically the comments), which is based on the example linked to in my original answer above:

# Data generation:
set.seed(2017-07-05)

# Parameters (deliberately chosen for an identifiable model):
N <- 200
mu <- -1
sd <- 1
shape <- 5
rate <- 2
proportion <- 0.5

# Simulate data:
latent_class <- rbinom(N, 1, proportion)
Observation <- ifelse(latent_class, rgamma(N, shape=shape, rate=rate), rnorm(N, mean=mu, sd=sd))

# Moderate degree of separation in the simulated data:
plot(density(Observation))

# The model:
model <- "model{

    # Use nested indexing to look at only the values that make sense for the gamma distribution:
    for(i in 1:length(MaybeGamma)){
        # Log density for the gamma part:
        ld_comp[MaybeGamma[i], 1] <- logdensity.gamma(Observation[MaybeGamma[i]], shape, rate)
    }

    # Look at all values for the normal distribution (but this could also use nested indexing):
    for(i in 1:N){
        # Log density for the normal part:
        ld_comp[i, 2] <- logdensity.norm(Observation[i], mu, tau)
    }

    # A separate loop for the distribution choice:
    for(i in 1:N){
        # Select one of these two densities:
        density[i] <- exp(ld_comp[i, component_chosen[i]])

        # Generate a likelihood for the MCMC sampler:
        Ones[i] ~ dbern(density[i])

        # The latent class part:
        component_chosen[i] ~ dcat(probs)
    }

    # Priors:
    shape ~ dgamma(0.01, 0.01)
    rate ~ dgamma(0.01, 0.01)
    probs ~ ddirch(c(1,1))
    mu ~ dnorm(0, 10^-6)
    tau ~ dgamma(0.01, 0.01)

    # Specify monitors, data and initial values using runjags:
    #monitor# shape, rate, probs, mu, tau
    #data# N, Observation, MaybeGamma, Ones, component_chosen, ld_comp
}"

# Note that the model requires the MaybeGamma data:
MaybeGamma <- which(Observation > 0)   # Only strictly positive observations can be Gamma

# And also the component_chosen is partly specified as data, so it is fixed for non-Gamma observations:
component_chosen <- ifelse(Observation <= 0, as.numeric(2), as.numeric(NA))
# When the Observation is definitely Gaussian then the component_chosen variable is fixed to 2,
# otherwise it is estimated by JAGS (could be either 1 or 2)

# And finally the ld_comp is partly specified as data so that it is defined (as any arbitrary
# numeric value) for the non-Gamma observations - the value is arbitrary as it will never be used:
ld_comp <- matrix(as.numeric(NA), ncol=2, nrow=N)
ld_comp[Observation <= 0, 1] <- 0

# Run the model using runjags (or use rjags if you prefer!)
library('runjags')

Ones <- rep(1,N)

results <- run.jags(model, sample=20000, thin=1)
results

plot(results)

As with all mixture models, expect autocorrelation, poor mixing, and poor convergence. In this case the model recovers these parameters reasonably well, but the data is deliberately simulated to help the model by having quite a few negative values. If your data is less easily separated then the model may be unidentifiable without stronger priors on the hyperparameters. From past experience I suggest that you think about an at least weakly informative prior for probs (otherwise the sampler is liable to run towards values of probs near 0 and 1 and stay there).

Matt

ps. This answer is probably better suited to Stack Overflow than Cross Validated but since your question started there and got migrated here then I will refrain from suggesting that it be migrated back!