Solved – Calculating the likelihood of time series data when there are missing data

I am trying to calculate the log-likelihood of some time series data given parameter sets estimated in BUGS. I can not figure out how to handle some missing values at random points in time.

For the complete data situation, such as $Y=(0.1,0.3,0.5,0.4,0.2,0.1)$, (real data is much longer) I have fitted a time series model assuming errors are normally distributed. For example, my BUGS code is something like:

for(t in 2:6){
    y[t] ~ dnorm(y.mean[t], tau)
    y.mean[t] <- phi0 + phi1*y[t-1]
}

i.e., the data is assumed to follow a normal distribution: $y_t \sim N(\phi_0+\phi_1 y_{t-1}, \sigma^2), 2<t<6$, where $\sigma$ is the standard deviation to the tolerance tau in the BUGS code. In R I can derive the log-likelihood of data,

$l(y_t|\phi_0,\phi_1,\sigma,y_{t-1})=\sum_{t=2}^{t=6}P(Y_t=y_t)$

where $P(Y_t=y_t)$ is a normal probability density function, given a single MCMC sample of parameters (for example, $\phi_0=0.25$, $\phi_1=0.55$ and $\sigma=0.35$) as such:

> y <-c(0.1,0.3,0.5,0.4,0.2,0.1)
> phi0 <- 0.25
> phi1 <- 0.55
> sigma <- 0.35
> 
> ymean <- phi0+phi1*y[1:5]
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE))
> ll
[1] -0.01241878

However, I am stuck when it comes to performing the correct calculation of the log-likelihood when there are missing data, say $Y=(0.1,0.3,0.5,NA,0.2,0.1)$ and $NA$ is missing? I believe that y[4] has to dropped in the R code/likelihood calculation. I am not sure how (or if) to estimate ymean[5], which is dependent on a missing $y_4$? BUGS of course provides a MCMC sample(s) for this missing data point, but should I use it, or do I keep the R code as is, adjusting for NA in the ymean[5] with na.rm=TRUE when summing over probability density functions:

> y[4]<-NA
> ymean<-phi0+phi1*y[1:5]
> ymean
[1] 0.305 0.415 0.525    NA 0.360
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE), na.rm=TRUE)
> ll
[1] 0.08714057

Best Answer

Why do you need to compute log-likelihood? Bugs computes MCMC samples (i.e., posterior distribution) of all parameters you want, and that should be all you need for bayesian inference.

If you have missing values of response variable, it will not bring new information to the model, you will only gain predictions for corresponding values of explanatory variables. So you can omit those from the computation. But if you need those missing values predicted by BUGS, I would safely include them in the computation. For more information, look here and here.

Best Answer

Related Solutions

Solved – Akaike Information criterion for k-means

Mixed Model – Understanding Marginal Likelihood of Mixed Effects Models

Related Question