I was always under the impression that the state formed the agent's observable world and actions taken by the agent would cause a state transition.
You are right that the agent's action $a$ causes the state transition, as defined by the transition function:
$$P(s'|s,a)$$
The state should contain enough (but no too much) information for the problem to be solved. Furthermore this information should be fully observable (i.e. you should always be able to deduce the current state), otherwise you need a Partially Observable Markov Decision Process (POMDP).
When you have defined the MDP, i.e. the states, actions, transition function, and rewards, then you can find the policy $\pi(s)$ (through Value Iteration or Policy Iteration) to maximize the expected reward. The policy is a function:
$$\pi(s): s \rightarrow a$$
Thus after having learnt the policy, you can apply the policy by passing the current state $s$ as argument, and it gives the best action $a$ to do according to the policy.
In the case of controlling traffic lights you could use the following state variables:
- $S_\text{time}$: the time, could be hours {0..23} or {morning, afternoon, night} for example.
- $S_\text{color}$: red, orange, green.
- $S_\text{traffic}$: light, dense.
The total set of state would then be the Cartesian product:
$$S = S_\text{time} \times S_\text{color} \times S_\text{traffic}$$
You could now define different probabilities based on different situations:
$$P(S'=\text{{morning, red, dense}} | S=\text{{morning, red, light}}, a) = ..$$
$$P(S'=\text{{morning, green, dense}} | S=\text{{morning, red, light}}, a) = ..$$
etc..
In addition some authors have included things like real time electricity pricing in the state space for planning demand response activities in the home. No action the user takes can possibly cause a change in the price of the electrical unit, but obviously the decision the agent makes is dependent upon it.
Here you could include pricing in the state, so when you execute the policy, the state variable $S_\text{price}$ will be set by the current electricity price. Note that you have to discretize these values, or look into Continuous state MDPs.
To be able to learn a good policy you however also need to define the probabilities of going from one state to another: the transition functions. So you need to calculate beforehand the probabilities of going from one price state to another.
In short, can I have the following state space {price, currentTime} in my MDP or does it need to be modelling differently.
So the answer is yes, but you have to make sure that you do not have too many states, since the complexity of finding a policy grows exponentially with the number of states. Furthermore, to have a good policy as result, you need good approximations of the state transitions probabilities ($P(s'|s,a)$).
Best Answer
A Markovian Decision Process indeed has to do with going from one state to another and is mainly used for planning and decision making.
The theory
Just repeating the theory quickly, an MDP is:
$$\text{MDP} = \langle S,A,T,R,\gamma \rangle$$
where $S$ are the states, $A$ the actions, $T$ the transition probabilities (i.e. the probabilities $Pr(s'|s, a)$ to go from one state to another given an action), $R$ the rewards (given a certain state, and possibly action), and $\gamma$ is a discount factor that is used to reduce the importance of the of future rewards.
So in order to use it, you need to have predefined:
Once the MDP is defined, a policy can be learned by doing Value Iteration or Policy Iteration which calculates the expected reward for each of the states. The policy then gives per state the best (given the MDP model) action to do.
In summary, an MDP is useful when you want to plan an efficient sequence of actions in which your actions can be not always 100% effective.
Your questions
I would call it planning, not predicting like regression for example.
See examples.
MDPs are used to do Reinforcement Learning, to find patterns you need Unsupervised Learning. And no, you cannot handle an infinite amount of data. Actually, the complexity of finding a policy grows exponentially with the number of states $|S|$.
See examples.
Examples of Applications of MDPs
And there are quite some more models. An even more interesting model is the Partially Observable Markovian Decision Process in which states are not completely visible, and instead, observations are used to get an idea of the current state, but this is out of the scope of this question.
Additional Information
A stochastic process is Markovian (or has the Markov property) if the conditional probability distribution of future states only depend on the current state, and not on previous ones (i.e. not on a list of previous states).