Solved – How to predict state probabilities or states for new data with DepmixS4 package, for Hidden Markov Models

hidden markov modelmixture-distributionr

It seems like I can learn the parameters just fine and find the posterior probabilities for the training data but I have no clue on how to make new predictions on new data. The problem in particular comes from the transition probabilities changing on covariates so it's not trivial to write code to predict new data.

The standard approach is to define the (dependent) mixture and fit the model:

mod <- depmix(EventTime ~ 1, data = data[1:40320,], nstates = 2, family
=multinomial("identity"), transition = ~ Count, instart = runif(2))

fm <- fit(mod, emcontrol=em.control(classification="soft", maxit = 60))

What we have above should function similarly to a binary HMM as it is trying to classify whether an event occurred as a 1/0 dependent variable moving through the sequence. The transition covariate is a frequency count variable that should directly affect the transition probabilities of the states which should thereafter control the emission probabilities of the 1/0 dependent variable.

It is possible to get the parameters of the model and set the parameters to another new model. However, there is no clear method of prediction, even though there should be somewhere in the guts of the library.

modNew <- depmix(EventTime~1,data=data2,transition=~Count,nstates=2,
family=multinomial("identity"))

modNew <- setpars(modNew,getpars(fm))

Note, in the documentation it says it is possible to run the viterbi algorithm to generate states for new data. However, this is not particularly useful to me and it seems to perfectly fit the data suggesting it still learns to fit the new data.

probs2 <- viterbi(modNew)

Note, I am new to this topic. This stage of implementation is difficult for me, yet it somehow seems like a basic part of an analysis.

Best Answer

Have you solved this? If not, perhaps you could try:

sum(forwardbackward(setpars(depmix(list(var~1), data=newData, nstates=3,family=list(gaussian())), getpars(originalModel)))[["alpha"]][nrow(data),])

This one-liner gets the probability of new data by running the forward algorithm on your original model. Please let me know if you arrived at a better solution, as I am tackling this problem myself.