Well, yes, there is a relationship between the two terms because the draws from MCMC form a Markov chain. From Gelman, Bayesian Data Analysis (3rd ed), p. 265:
Markov chain simulation (also called Markov chain Monte Carlo or MCMC) is a general method based on drawing values of $\theta$ from appropriate distributions and then correcting those draws to better approximate the target posterior distribution, $p(\theta|y)$. The sampling is done sequentially, with the distribution of the sampled draws depending on the last value drawn; hence, the draws form a Markov chain.
I was always under the impression that the state formed the agent's observable world and actions taken by the agent would cause a state transition.
You are right that the agent's action $a$ causes the state transition, as defined by the transition function:
$$P(s'|s,a)$$
The state should contain enough (but no too much) information for the problem to be solved. Furthermore this information should be fully observable (i.e. you should always be able to deduce the current state), otherwise you need a Partially Observable Markov Decision Process (POMDP).
When you have defined the MDP, i.e. the states, actions, transition function, and rewards, then you can find the policy $\pi(s)$ (through Value Iteration or Policy Iteration) to maximize the expected reward. The policy is a function:
$$\pi(s): s \rightarrow a$$
Thus after having learnt the policy, you can apply the policy by passing the current state $s$ as argument, and it gives the best action $a$ to do according to the policy.
In the case of controlling traffic lights you could use the following state variables:
- $S_\text{time}$: the time, could be hours {0..23} or {morning, afternoon, night} for example.
- $S_\text{color}$: red, orange, green.
- $S_\text{traffic}$: light, dense.
The total set of state would then be the Cartesian product:
$$S = S_\text{time} \times S_\text{color} \times S_\text{traffic}$$
You could now define different probabilities based on different situations:
$$P(S'=\text{{morning, red, dense}} | S=\text{{morning, red, light}}, a) = ..$$
$$P(S'=\text{{morning, green, dense}} | S=\text{{morning, red, light}}, a) = ..$$
etc..
In addition some authors have included things like real time electricity pricing in the state space for planning demand response activities in the home. No action the user takes can possibly cause a change in the price of the electrical unit, but obviously the decision the agent makes is dependent upon it.
Here you could include pricing in the state, so when you execute the policy, the state variable $S_\text{price}$ will be set by the current electricity price. Note that you have to discretize these values, or look into Continuous state MDPs.
To be able to learn a good policy you however also need to define the probabilities of going from one state to another: the transition functions. So you need to calculate beforehand the probabilities of going from one price state to another.
In short, can I have the following state space {price, currentTime} in my MDP or does it need to be modelling differently.
So the answer is yes, but you have to make sure that you do not have too many states, since the complexity of finding a policy grows exponentially with the number of states. Furthermore, to have a good policy as result, you need good approximations of the state transitions probabilities ($P(s'|s,a)$).
Best Answer
Technically, both the processes you describe are markov chains. The difference is that the first one is a first order markov chain whereas the second one is a second order markov chain. And yes, you can transform a second order markov chain to a first order markov chain by a suitable change in state space definition. Let me explain via an example.
Suppose that we want to model the weather as a stochastic process and suppose that on any given day the weather can be rainy, sunny or cloudy. Let $W_t$ be the weather in any particular day and let us denote the possible states by the symbols $R$ (for rainy), $S$ for (sunny) and $C$ (for cloudy).
First Order Markov Chain
$P(W_t = w | W_{t-1}, W_{t-2},W_{t-3} ..) = P(W_t = w | W_{t-1})$
Second Order Markov Chain
$P(W_t = w | W_{t-1}, W_{t-2},W_{t-3} ..) = P(W_t = w | W_{t-1},W_{t-2})$
The second order markov chain can be transformed into a first order markov chain be re-defining the state space as follows. Define:
$Z_{t-1,t}$ as the weather on two consecutive days.
In other words, the state space can take one of the following values: $RR$, $RC$, $RS$, $CR$, $CC$, $CS$, $SR$, $SC$ and $SS$. With this re-defined state space we have the following:
$P(Z_{t-1,t} = z_{t-1,t} | Z_{t-2,t-1}, Z_{t-3,t-2}, ..) = P(Z_{t-1,t} = z_{t-1,t} | Z_{t-2,t-1})$
The above is clearly a first order markov chain on the re-defined state space. The one difference from the second order markov chain is that your redefined markov chain needs to be specified with two initial starting states i.e., the chain must be started with some assumptions about the weather on day 1 and on day 2.