This has nothing to do with frequentism. When the policy $\pi$ creates a distribution over actions, it is called a stochastic policy.
Originally, policies were not stochastic since they were defined as mapping to the highest-value action. The actually policy that is followed, in say an $\epsilon$-greedy approach is to disobey that deterministic policy and act randomly with probability $\epsilon$.
The intuition behind the argument saying that the optimal policy is independent of initial state is the following:
The optimal policy is defined by a function that selects an action for every possible state and actions in different states are independent.
Formally speaking, for an unknown initial distribution, the value function to maximize would be the following (not conditioned on initial state)
$v^\pi = E[ R(s_0,a_0) + \gamma R(s_1,a_1) + ... | \pi ] $
Thus the optimal policy is the policy that maximizes $v^\pi=x_0^T V^\pi$ where $x_0$ is the vector defined as $x_0(s)=Prob[s_0=s]$ and $V^\pi$ is the vector with $V^\pi (s)$ defined the same as your definition. Thus the optimal policy $\pi ^*$ is a solution of the following optimization
$\max_{\pi} x_0^T V^\pi = \max_{\pi} \sum _{s} x_0(s)V^\pi (s)= \sum _{s} x_0(s) \max_{a\in A_s}V^\pi (s)= \sum _{s} x_0(s)V^* (s)$
where $A_s$ is the set of actions available at state $s$. Note that the third equation is only valid because $x_0(s)\geq 0$ and we can separate the decision policy by selecting independent actions at different states. In other words, if there were constraints for example on $x_t$ (e.g., if the optimization only searches among policies that guarantee $x_t \leq d$ for $t>0$), then the argument is not valid anymore and the optimal policy is a function of the initial distribution.
For more details, you can check the following arXiv report:
"Finite-Horizon Markov Decision Processes with State Constraints", by
Mahmoud El Chamie, Behcet Acikmese
Best Answer
The point of visiting a state in value iteration is in order to update its value, using the update:
$$v(s) \leftarrow \text{max}_a[\sum_{r,s'} p(s', r|s,a)(r + \gamma v(s'))]$$
First thing to note is that the state value of terminal state $s^T$ is $v(s^T) = 0$, always, since by definition there are no future rewards to accumulate. It definitely would not be a valid calculation that found a possible reward or different next state after a terminal state and updated the value to be non-zero.
You can define things so that it is valid to run the update. If you implement terminal states as "absorbing states" then this means $p(s^T, 0|s^T,*)=1$, and probability of any other state, reward pair is zero, so running the update as above results in updating $0$ to $0$.
In general there is no point updating the value function of a terminal state, although with correct definitions of transition and reward functions there is no harm to do so, it is just wasted calculations.