I am trying to calculate the log-likelihood of some time series data given parameter sets estimated in BUGS. I can not figure out how to handle some missing values at random points in time.
For the complete data situation, such as $Y=(0.1,0.3,0.5,0.4,0.2,0.1)$, (real data is much longer) I have fitted a time series model assuming errors are normally distributed. For example, my BUGS code is something like:
for(t in 2:6){
y[t] ~ dnorm(y.mean[t], tau)
y.mean[t] <- phi0 + phi1*y[t-1]
}
i.e., the data is assumed to follow a normal distribution: $y_t \sim N(\phi_0+\phi_1 y_{t-1}, \sigma^2), 2<t<6$, where $\sigma$ is the standard deviation to the tolerance tau
in the BUGS code. In R I can derive the log-likelihood of data,
$l(y_t|\phi_0,\phi_1,\sigma,y_{t-1})=\sum_{t=2}^{t=6}P(Y_t=y_t)$
where $P(Y_t=y_t)$ is a normal probability density function, given a single MCMC sample of parameters (for example, $\phi_0=0.25$, $\phi_1=0.55$ and $\sigma=0.35$) as such:
> y <-c(0.1,0.3,0.5,0.4,0.2,0.1)
> phi0 <- 0.25
> phi1 <- 0.55
> sigma <- 0.35
>
> ymean <- phi0+phi1*y[1:5]
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE))
> ll
[1] -0.01241878
However, I am stuck when it comes to performing the correct calculation of the log-likelihood when there are missing data, say $Y=(0.1,0.3,0.5,NA,0.2,0.1)$ and $NA$ is missing? I believe that y[4]
has to dropped in the R code/likelihood calculation. I am not sure how (or if) to estimate ymean[5]
, which is dependent on a missing $y_4$? BUGS of course provides a MCMC sample(s) for this missing data point, but should I use it, or do I keep the R code as is, adjusting for NA
in the ymean[5]
with na.rm=TRUE
when summing over probability density functions:
> y[4]<-NA
> ymean<-phi0+phi1*y[1:5]
> ymean
[1] 0.305 0.415 0.525 NA 0.360
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE), na.rm=TRUE)
> ll
[1] 0.08714057
Best Answer
Why do you need to compute log-likelihood? Bugs computes MCMC samples (i.e., posterior distribution) of all parameters you want, and that should be all you need for bayesian inference.
If you have missing values of response variable, it will not bring new information to the model, you will only gain predictions for corresponding values of explanatory variables. So you can omit those from the computation. But if you need those missing values predicted by BUGS, I would safely include them in the computation. For more information, look here and here.