Solved – look up table as a special case of a the linear function approximation (Reinforcement learning)

machine learningq-learningreinforcement learning

In reinforcement learning, where the state space is discrete and relatively small, a form of learning algorithm commonly used is the Q learning. This involves a look up table $Q(s,a)$ (Or you think of this as a matrix with the states on the row, and actions on the column). The entries are updated as the agent continues to learn.

In the case of continuous state space, or a overly large state space, a look up table is not feasible anymore. And we turn to Q values with linear function approximation. In this case, Q values are now approximated by a linear combination of features.

I was watching a lecture by David Silver on Reinforcement Learning, he mentioned that the look up table is just a special case of the linear function approximation. (The slide I am referring to begins at 28:43.) This never occurred to me, and he showed a little 'proof' that was not so clear to me. Anyone who could give some insights into the matter?

Originally, I just accepted (without proof) that look up table and linear function approximation are just two independent things. It never occurred to me that the two are related.

Best Answer

In linear approximation, we approximate the value of a state as a linear combination of some feature vector and a vector of weights, i.e. $\hat v(s, a) = \textbf{s}\cdot\textbf{w}$ for a feature vector $\textbf{s}$ and weights $\textbf{w}$

What Silver's getting at in this is that, in the discrete case, we can construct a feature vector $\textbf{s}$ as a list of dummy variables, each an indicator corresponding to a discrete state. This feature vector $\textbf{s}$ will have a zero for all entries except one, which will be one. Each $w_i \in \bf{w}$ will then correspond to a single state $s_i$, and $\hat v$ will just return that weight as the value of $s_i$.

Under this representation, you can view a lookup table value function as a special case of a linear approximation.