Solved – Q-Learning vs Fitted Q-Iteration

reinforcement learning

I am reading about Q-Learning in the context of Reinforcement learning – I understand that q-learning is a form on online learning where we are given a sequence of tuples as input. I am following udacity https://www.udacity.com/course/machine-learning-reinforcement-learning–ud820
and this survey paper: https://www.jair.org/media/301/live-301-1562-jair.pdf

I also understand Value Iteration (VI) and Fitted Value Iteration(FVI). My question is does Fitted Q-iteration simply mean Q-Learning with some kind of state space approximation? Just like FVI is VI with a linear approximation for the state space?

Best Answer

You are right. It means that Q function is approximated linearly.

Let $S$ be a state space and $A$ be an action space. $\textbf{x}(s,a) = (x_1(s,a),\ldots,x_n(s,a))$ where $s \in S$, is a vector of features of $S \times A$ and $\textbf{x}(s,a) \in \mathbb{R}^n$.

Suppose, that $Q(a,s)$ is the real Q-value function. Now we may try to approximate it with the following estimation function:

$$\hat{Q}(a,s,\textbf{w}) = \textbf{w} \cdot \textbf{x}(s,a) = \sum_{i=1}^nw_ix_i(s,a)$$

So you may want to make features for state-action pairs, instead of making features for states only. To fine-tune the $\textbf{w}$ vector, you can use gradient descend methods. For more on this issue see Sutton & Barto, control with function approximation.