Your example is a little weaker than it could be, because there don't really seem to be any actions, however, we can work with that; I'll preserve the $\max_a$ and $R(s,a)$ pieces of notation, although $a$ isn't operative in the example. I'm assuming that from state 1 there is a 50-50 chance of transitioning to either state 2 or state 3, and that the discount rate is also 50%, as per your example. I also assume the rewards $R(s,a)$ for being in states 1, 2, and 3 are 0, 2, and 0 respectively.
Value iteration iterates until the values converge. At the first iteration, as you have, $V_0(s) = 0 \space \forall s$.
At iteration 1:
$V_1(s=3) = \max_a\{R(3,a)\} = 0$ as state 3 is a terminal state, so no transitions.
$V_1(s=2) = \max_a\{R(2,a)\} = 2$ as state 2 is a terminal state, so no transitions.
$V_1(s=1) = \max_a\{R(1,a) + 0.5(0.5*V_0(s=2) + 0.5*V_0(s=3))\} = 0$
At iteration 2:
$V_2(s=3) = \max_a\{R(3,a)\} = 0$ as state 3 is a terminal state, so no transitions.
$V_2(s=2) = \max_a\{R(2,a)\} = 2$ as state 2 is a terminal state, so no transitions.
$V_2(s=1) = \max_a\{R(1,a) + 0.5(0.5*V_1(s=2) + 0.5*V_1(s=3))\} = 0.5$
At iteration 3:
$V_3(s=3) = \max_a\{R(3,a)\} = 0$ as state 3 is a terminal state, so no transitions.
$V_3(s=2) = \max_a\{R(2,a)\} = 2$ as state 2 is a terminal state, so no transitions.
$V_3(s=1) = \max_a\{R(1,a) + 0.5(0.5*V_2(s=2) + 0.5*V_2(s=3))\} = 0.5$
And we have convergence! All the $V_3(s) = V_2(s)$. In a more complex example, with actions included, $V_3(s) = V_2(s)$ implies that the actions selected at iteration 3 and at iteration 2 are either equal or equivalent in terms of value; having part of the algorithm be a fixed way of choosing between actions that are tied, in terms of expected value, reduces the implication to the actions being the same between iterations.
There are a few subtleties with the PyBrain library and NFQ. I don't have a lot of experience with NFQ, but it's part of the course I tutor at my university. We use the Pybrain library because it's a good intro to a lot of these things. Generally, there are 2 things that help:
Use exploration. set learner.epsilon=x for some x in [0,1] where 0 means only rely on the network's output, and 1 means act completely randomly. A value of 0.05-0.2 can help learning most problems enormously.
Use more learning episodes and more hidden neurons. NFQ only fits to the number of episodes you tell it to, at a complexity based on the number of hidden units. Running more independent episodes and/or running longer episodes can give more experience for training the network.
These approaches have been used to improve NFQ performance a lot on tasks such as the 2048 game, so I imagine it should be similar for your case. In general though, for grid-world type problems, I find table based RL to be far superior. RBF neural nets might also be good (disclaimer: I haven't tried this).
Another thing to check: make sure you give your agent enough information that it could reasonably figure out which direction to go at each point. It has no memory, so if it can't "see" any landmarks to point it in the right direction, it will only learn random noise.
Best Answer
There are a few requirements for Value Iteration to guarantee convergence:
State space and action space should be finite
Reward values should have an upper and lower bound
Environment should be episodic or if continuous then discount factor should be less than 1
The value function should be represented as a table, one entry per state. If you use function approximation over state vectors, then value iteration can be unstable, just like Q Learning.
That is not necessarily a strong enough guarantee. If there are loops in state possible, where an agent could either get stuck gaining negative rewards, or where it could gain infinite positive reward over time, then you may have a problem with episodes not ending.
For a strictly episodic problem - no opt-in loops with positive reward, no "trap" loops with negative reward - then no. Discounting is not required in that case.