Solved – JAGS Multinomial mixture model with missing data

jagsmultinomial-distribution

I am trying to fit a multinomial mixture model to data from a stream depletion survey. The data were collected by selecting a stream site that is a standard length (usually 150-200m depending on width), blocking the upper and lower end of the site off with a block net to assume closure (i.e., no fish are immigrating or emigrating during the survey), electrofish the entire length and collect all fish over multiple passes. At the end of each pass the number of fish are recorded and the fish are released outside of the sampling area. In my data set the number of passes range from 2 to 6 and I have up to five sites sampled every two years for a total of eight years. I have the response (number of fish collected) in a three way matrix (year x site x pass) which is modeled using a multinomial distribution. Below is a some code to create the type of data I am modeling.

fish_fakeData<-c(4,8,9,NA,7,16,1,4,10,NA,3,5,NA,NA,5,NA,0,2)
dim(fish_fakeData)<-c(3,2,3)

The error I am getting would the thrown for element 1,1,3 where there is an NA for the last pass but values for the first two passes. The NA indicates that site did not have three passes for that particular year.

The basic model I am fitting in JAGS is

model{
for (l in 1:speciesN){
  a[l]~dnorm(0,0.01)I(-10,10)
  for (k in 1:siteN){
   beta0[l,k] ~ dunif(0,100)
  }
}

for (i in 1:obs){

logit(p[sp[i],site[i]])<-a[sp[i]]

mu[sp[i],site[i],1]<- p[sp[i],site[i]]
mu[sp[i],site[i],2]<- p[sp[i],site[i]]*(1-p[sp[i],site[i]])
mu[sp[i],site[i],3]<- p[sp[i],site[i]]*(1-p[sp[i],site[i]])*(1-p[sp[i],site[i]])

mu_1[sp[i],site[i]]<-mu[sp[i],site[i],1]
mu_2[sp[i],site[i]]<-mu[sp[i],site[i],2]
mu_3[sp[i],site[i]]<-mu[sp[i],site[i],3]

pi0[sp[i],site[i]]<- 1 - mu[sp[i],site[i],1]-mu[sp[i],site[i],2]-mu[sp[i],site[i],3]
pcap[sp[i],site[i]]<-1-pi0[sp[i],site[i]]

for(j in 1:3){
   muc[sp[i],site[i],j]<-mu[sp[i],site[i],j]/(pcap[sp[i],site[i]]+0.0000001) #add a small offset to prevent division by 0
}

#observed counts model
ncap[year[i],site[i]]~dbin(pcap[year[i],site[i]],N[year[i],site[i]])

#Abundance model
N[year[i],site[i]] ~ dpois(lambda[year[i],site[i]])
log(lambda[year[i],site[i]])<- beta0[year[i],site[i]]

y[year[i],site[i],1:3]~dmulti(muc[year[i],site[i],1:3],ncap[year[i],site[i]])

}
}

When I try to fit the model with the missing data, more specifically sites having NA's on pass 3, I get the error:
Error in jags.model("model.txt", data = dataList, inits = initsList, n.chains = nChains, :
RUNTIME ERROR:
Compilation error on line 57.
y[1,1,1:3] has missing values

I know this is referencing the NA's in the response but I thought JAGS handled missing response data automatically? Or is this not the case with the multinomial distribution? If there cannot be missing data in a multinomial distribution are there any suggestions to get around it without dropping sites or years to have a balanced design?

Thanks!

EDIT

I think I found a work-around. Rather than making the number of passes static in the line

y[year[i],site[i],1:3]~dmulti(muc[year[i],site[i],1:3],ncap[year[i],site[i]])

I added a new variable for number of passes at each year/site combination so that

[year[i],site[i],1:npass[i]]~dmulti(muc[year[i],site[i],1:npass[i]],ncap[year[i],site[i]])

Does this fix make sense? I know I also need to modify the code so that mu values that are not needed are not estimated (e.g., mu_3 for site 1 in the data example).

Best Answer

You can't use missing data with the multinomial distribution

See e.g. here from the Patuxent folk for some relevant coding work-arounds

http://www.mbr-pwrc.usgs.gov/workshops/unmarked/Slides/Slides_Multimix.pdf

Related Solutions

Solved – jags missing data error

My understanding is that if the outcome is NA then it will fill it in from the posterior predictive. NA in the predictors is not allowed, and must be imputed.

Solved – Right-censored survival fit with JAGS

I was asked to re-post this answer here from my comment at http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-example-of-right-censoring-in.html The specifics of this answer relate to the model in that comment, but the concepts apply to the topic here.

The core of the JAGS model for censored data is this:

isCensored[i] ~ dinterval( y[i] , censorLimitVec[i] )
y[i] ~ dnorm( mu , tau )

The key to understanding what JAGS is doing is that JAGS automatically imputes a random value for any variable that is not specified as a constant in the data. Thus, when y[i] is NA (i.e., a missing value, not a constant), then JAGS imputes a random value for it.

But what value should it generate?

The second line of the model, above, says that y[i] should be randomly generated from a normal distribution with mean mu and precision tau.

But the first line of the model, above, puts another constraint on the randomly generated value of y[i]. That line says that whatever value of y[i] is randomly generated, it must fall on the side of censorLimitVec[i] dictated by the value of isCensored[i].

To understand this part, let's unpack the dinterval() distribution. Suppose that censorLimitVec has 3 values in it, not just 1:

censorLimitVec = c(10,20,30)

Then randomly generated values from dinterval(y,c(10,20,30)) will be either 0, 1, 2, or 3 depending on whether $y<10$, $10 < y < 20$, $20<y<30$, or $30<y$. So, if $y=15$, dinterval(y,c(10,20,30)) has output of $1$ with 100% probability. The trick is this: We instead specify the output of dinterval, and impute a random value of y that could produce it. Thus, if we say

1 ~ dinterval(y,c(10,20,30))

then y is imputed as a random value between 10 and 20.

Putting the two model statements together,

1 ~ dinterval( y , censorLimit )

y ~ dnorm( mu , tau )

means that y comes from a normal density and y must fall above the censorLimit.

Hope that helps!!

Best Answer

Related Solutions

Solved – jags missing data error

Solved – Right-censored survival fit with JAGS

Related Question