Your example is a little weaker than it could be, because there don't really seem to be any actions, however, we can work with that; I'll preserve the $\max_a$ and $R(s,a)$ pieces of notation, although $a$ isn't operative in the example. I'm assuming that from state 1 there is a 50-50 chance of transitioning to either state 2 or state 3, and that the discount rate is also 50%, as per your example. I also assume the rewards $R(s,a)$ for being in states 1, 2, and 3 are 0, 2, and 0 respectively.
Value iteration iterates until the values converge. At the first iteration, as you have, $V_0(s) = 0 \space \forall s$.
At iteration 1:
$V_1(s=3) = \max_a\{R(3,a)\} = 0$ as state 3 is a terminal state, so no transitions.
$V_1(s=2) = \max_a\{R(2,a)\} = 2$ as state 2 is a terminal state, so no transitions.
$V_1(s=1) = \max_a\{R(1,a) + 0.5(0.5*V_0(s=2) + 0.5*V_0(s=3))\} = 0$
At iteration 2:
$V_2(s=3) = \max_a\{R(3,a)\} = 0$ as state 3 is a terminal state, so no transitions.
$V_2(s=2) = \max_a\{R(2,a)\} = 2$ as state 2 is a terminal state, so no transitions.
$V_2(s=1) = \max_a\{R(1,a) + 0.5(0.5*V_1(s=2) + 0.5*V_1(s=3))\} = 0.5$
At iteration 3:
$V_3(s=3) = \max_a\{R(3,a)\} = 0$ as state 3 is a terminal state, so no transitions.
$V_3(s=2) = \max_a\{R(2,a)\} = 2$ as state 2 is a terminal state, so no transitions.
$V_3(s=1) = \max_a\{R(1,a) + 0.5(0.5*V_2(s=2) + 0.5*V_2(s=3))\} = 0.5$
And we have convergence! All the $V_3(s) = V_2(s)$. In a more complex example, with actions included, $V_3(s) = V_2(s)$ implies that the actions selected at iteration 3 and at iteration 2 are either equal or equivalent in terms of value; having part of the algorithm be a fixed way of choosing between actions that are tied, in terms of expected value, reduces the implication to the actions being the same between iterations.
If you haven't yet, check out this page which covers SARSA with LFA: http://artint.info/html/ArtInt_272.html
Sutton's book is really confusing in how they describe how to set up your feature space F(s,a), but in the web page above, they describe it in a simple example. Applying the architecture of theta and F(s,a) from that page to Sutton's algorithm works very well.
Suppose you have 4 possible actions in a state. Create a reward Q distribution (in this case a 4-value array), with one value for each possible action in the given state. Iterate over each action, and for that action, populate the feature space based on what that action will do to/for the agent.
For example, if the agent is directly below a wall, and the chosen action is 'up', there should be a 1 for the feature 'is the agent about to try to move into a wall'. Likewise, for action='right' and wall to the right, the same feature would be a 1, etc. for all other possibilities.
You've probably moved past this problem a while ago, but if not, hope this helped!
Best Answer
You are right. It means that Q function is approximated linearly.
Let $S$ be a state space and $A$ be an action space. $\textbf{x}(s,a) = (x_1(s,a),\ldots,x_n(s,a))$ where $s \in S$, is a vector of features of $S \times A$ and $\textbf{x}(s,a) \in \mathbb{R}^n$.
Suppose, that $Q(a,s)$ is the real Q-value function. Now we may try to approximate it with the following estimation function:
$$\hat{Q}(a,s,\textbf{w}) = \textbf{w} \cdot \textbf{x}(s,a) = \sum_{i=1}^nw_ix_i(s,a)$$
So you may want to make features for state-action pairs, instead of making features for states only. To fine-tune the $\textbf{w}$ vector, you can use gradient descend methods. For more on this issue see Sutton & Barto, control with function approximation.