Assumption $1$ should be
$$
P\big(q_i\,\big|\,q_1,q_2,\dots,q_{i-1},o_1,o_2,\dots,o_{i-1}\,\big)=P\big(q_i\,\big|
,q_{i-1}\,\big)\ .
$$
That is, the distribution of the hidden state at time $\ i\ $ given all the preceding hidden states ,and all the preceding observations, depends only on the preceding state. Assumption $2$ should be
$$
P\big(o_i\,\big|\,q_1,q_2,\dots,q_T,o_1,\dots,o_{i-1},o_{i+1},\dots,o_T\,\big)=P\big(o_i\,\big|q_i\,\big)\ .
$$
That is, the distribution of the observation at time $\ i\ $ given all the hidden states and all the other observations depends only on the hidden state at time $\ i\ $. In my experience, assumption $2$ is more commonly stated in the form
$$
P\big(o_i\,\big|\,q_1,q_2,\dots,q_i,o_1,\dots,o_{i-1}\,\big)=P\big(o_i\,\big|\,q_i\,\big)\ .
$$
Although this may appear to be a weaker assumption at first sight, it's in fact equivalent, given assumption $1$.
You're quite correct that given $\ q_1,q_2,\dots,q_T\ $, the distribution of $\ o_{t+1},o_{t+2}, \dots, o_T\ $ depends only on $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $. The key to understanding the dependence of
$$
\beta_t(i)=P\big(o_{t+1},o_{t+2},\dots,o_T\,\big|
\,q_t=i\,\big)
$$
on $\ i\ $, however, is that you're not given any of $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $ in the conditioning event—you're only given $\ q_t\ $— and the distribution of the subsequent hidden states, $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $, depends on the value of $\ q_t\ $. The dependence of $\ \beta_t(i)\ $ on $\ i\ $ comes from the fact that the distribution of $\ o_{t+1},o_{t+2}, \dots, o_T\ $ depends on $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $ and these latter states pass their dependence on the value of $\ q_t\ $ through to that distribution.
Best Answer
Why not? Ties are esentially irrelevant, in the sense that any election will find the same global maximum. Hence to pick the first one is as correct as picking any other.