In reinforcement learning, linear function approximation is often used when large state spaces are present. (When look up tables become unfeasible.)
The form of the $Q-$value with linear function approximation is given by
$$Q(s,a) = w_1 f_1(s,a) + w_2 f_2(s,a) + \cdots, $$
where $w_i$ are the weights, and $f_i$ are the features.
The features are predefined by the user. My question is, how are the weights assigned?
I have read/downloaded some lecture slides on $Q-$learning with function approximation. Most of them have slides on linear regression that follow. Since they are just slides, they tend to be incomplete. I wonder what the connection/relation is between the two topics.
Best Answer
Function approximation is basically a regression problem (in the general sense, i.e. opposed to classification where the class is discrete), i.e. one tries to learn a function mapping from input (in your case $f(s,a)$) to a real-valued output $Q(s,a)$. Since we do not have a full table of all input / output values, but instead learn and estimate $Q(s,a)$ at the same time, the parameters (here: the weights $w$) cannot be calculated directly from the data. A common approach here is to use gradient descent.
Here is the general algorithm for learning $Q(s,a)$ with Value Function Approximation
For each episode:
Repeat 2-5 until $s$ is terminal
where ...
The parameters/weights-update (4th step) can be read in such a way:
Main Source:
Chapter 8 Value Approximation of the (overall recommended) book Reinforcement Learning: An Introduction by Sutton and Barto (First Edition). The general algorithm has been modified as it is commonly done to calculate $Q(s,a)$ instead of $V(s)$. I have also dropped the eligibility traces $e$ to focus on gradient descent, hence using only one-step-backups
More references