Solved – Build a quasi Markov chain model using Deep Learning

deep learninglstmmarkov-processrecurrent neural network

My problem is similar to Markov Discrete Process, with one little but – it doesn't have true markov property. a probability to move to next state rarely depend only on current state but rather on 3-10 previous states.. these numbers (3-10) are not known and are a guess.

I want an idea on how to approach this using a deep learning machine (RNN perhaps) which will discover most robust patterns (most statistically stable dependencies between the current state and previous states), one state can be part of many patterns simultaneosly.

Then I want the machine to predict next state as a matrix of probabilities for each possible values, that is if there are 5 possible values for each states then I need 5 prob numbers for one predicted state. Similar to markov's transition matrix.

how would you approach this problem from a DL prospective?

P.S. it appears to be a case of continuous unsupervised learning

Best Answer

If you have no requirement concerning programming language, it might be easiest to get started with keras.

Roughly you want to approach the problem as follows:

  • convert your discrete input sequence into one-hot vectors (i.e. vectors where only one of the dimensions is 1, all the others are 0. The amount of dimensions of a one-hot vector equals the amount of possible values)
  • add an embedding layer to your network (an embedding layer converts your sparse vectors into more dense vector representations that contain some semantic information)
  • feed the output of that layer into a stateful LSTM layer (important to have a stateful network)
  • feed the output of the LSTM layer into a dense timedistributed layer with softmax activation function. The output dimension of this layer is again the amount of possible values such that the weights of this layer give the probabilities for the next value assignment.

If all of this seems confusing, you're going to need to read up on some of these concepts. Perhaps starting here: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Related Question