The key point is to write $B_T^2$ in a clever way:
$$\begin{align*} B_T^2 &= \sum_{j=1}^n (B_{t_j}^2-B_{t_{j-1}}^2) \\ &= \sum_{j=1}^n \big( (B_{t_j}-B_{t_j^{\ast}})+B_{t_j^{\ast}} \big)^2 - \big( (B_{t_{j-1}}-B_{t_j^{\ast}})+B_{t_j^{\ast}} \big)^2 \\ &= 2 \sum_{j=1}^n B_{t_j^{\ast}} (B_{t_j}-B_{t_{j-1}}) + \sum_{j=1}^n (B_{t_j}-B_{t_j^{\ast}})^2- \sum_{j=1}^n (B_{t_{j-1}}-B_{t_j^{\ast}})^2 \tag{1} \end{align*}$$
where $t_j^{\ast}:= \frac{1}{2}(t_j+t_{j-1})$. Now, by definition, the first term on the right-hand side converges to the Stratonovich integral
$$\int_0^T B_t \circ dB_t.$$
Consequently, it remains to show that the remaining terms converge to $0$ as the mesh size $|\Pi|$ converges to $0$. To this end, we set
$$S_n := \sum_{j=1}^n (B_{t_j}-B_{t_j^{\ast}})^2$$
and note that by the stationarity of the increments
$$\mathbb{E}(S_n) = \mathbb{E} \left( \sum_{j=1}^n (B_{t_j}-B_{t_j^{\ast}})^2 \right) = \sum_{j=1}^n (t_j-t_j^{\ast}) = \frac{T}{2}.$$
Consequently, by the independence of the increments,
$$\begin{align*} \mathbb{E} \left[ \left( \sum_{j=1}^n (B_{t_j}-B_{t_j^{\ast}})^2 - \frac{T}{2} \right)^2 \right] &= \text{Var} \, (S_n) \\ &= \sum_{j=1}^n \text{Var} \, \left[ (B_{t_j}-B_{t_j^{\ast}})^2 - (t_j-t_j^{\ast}) \right]. \end{align*}$$
Using the scaling property, i.e. $B_t \sim \sqrt{t} B_1$ for any $t \geq 0$, it is therefore not difficult to see that this expression converges to $0$ as $|\Pi| \to 0$. In fact, we find that
$$\mathbb{E} \left[ \left( \sum_{j=1}^n (B_{t_j}-B_{t_j^{\ast}})^2 - \frac{T}{2} \right)^2 \right] \leq C|\Pi| T$$
for $C:= \mathbb{E}((B_1^2-1)^2)<\infty$. Hence,
$$\sum_{j=1}^n (B_{t_j}-B_{t_j^{\ast}})^2 \stackrel{L^2}{\to} \frac{T}{2}.$$
Exactly the same calculation goes through for the third addend in $(1)$. Finally, we conclude
$$B_T^2 = 2\int_0^T B_t \circ dB_t + \frac{T}{2} - \frac{T}{2} = 2\int_0^T B_t \circ dB_t.$$
If $X_t $ is of the form
$$
X_t = \int_0^t f(s,\omega)dB_s(\omega),\quad t\geq 0,
$$ then by using localization technique, we may assume $X_t$ is a $L^2$-bounded martingale. In that case, there is a sequence of elementary processes $\{f_n\}$ such that
$$
E[\int_0^T |f_n(s,\cdot)-f(s,\cdot)|^2ds]\to 0
$$ for all $T>0$. Let $X_{n,t} = \int_0^t f_n(s,\omega)dB_s(\omega)$. Then by martingale maximal inequality, we have
$$
E[\sup_{t\in [0,T]}|X_t -X_{n,t}|^2]\leq C E[\int_0^T |f_n(s,\cdot)-f(s,\cdot)|^2ds]\to 0.
$$ This implies that $X_{n.t} \to_p X_t$ uniformly on every compact interval $[0,T]$. We may further assume that $g\in C^2_0$ via localization method. Then,
$$
g(X_{n,t}), g'(X_{n,t}), g''(X_{n,t})
$$ converge in probability to
$$
g(X_t), g'(X_t), g''(X_t)
$$ locally uniformly.
Best Answer
If $x \mapsto \Phi(x,t)$ is differentiable, it follows from Taylor's formula that
$$\Phi(x,t) = \Phi(y,t) + (x-y) \frac{\partial}{\partial x} \Phi(\zeta,t)$$
for some intermediate point $\zeta$ between $x$ and $y$ (i.e we can find $\lambda \in (0,1)$ such that $\zeta = \lambda x+ (1-\lambda) y$). Using this identity for
$$x :=\frac{x(t_j)+x(t_{j+1})}{2} \qquad y := x(t_j) \qquad t = t_j$$
we find
$$\Phi \left( \frac{x(t_j)+x(t_{j+1})}{2}, t_j \right) = \Phi(x(t_j),t_j)+ \frac{x(t_{j+1})-x(t_j)}{2} \frac{\partial}{\partial x} \Phi(\zeta,t_j) \tag{1}$$
with
$$\zeta = \lambda x(t_j)+ (1-\lambda) \frac{x(t_j)+x(t_{j-1})}{2} \tag{2} $$
for some $\lambda \in (0,1)$. Note that $(2)$ is equivalent to
$$\begin{align*} \zeta &= x(t_j) \left[ \frac{2\lambda}{2} + \frac{(1-\lambda)}{2} \right] + \underbrace{\frac{1-\lambda}{2}}_{=:\theta} x(t_{j+1}) \\ &= x(t_j)(1-\theta) + \theta x(t_{j+1}) \end{align*}$$
for some $\theta \in (0,1/2)$. Hence, by $(1)$,
$$\Phi \left( \frac{x(t_j)+x(t_{j+1})}{2}, t_j \right) -\Phi(x(t_j),t_j)= \frac{x(t_{j+1})-x(t_j)}{2} \frac{\partial}{\partial x} \Phi ( x(t_j)(1-\theta) + \theta x(t_{j+1}), t_j).$$
Multiplying this expression with $x(t_{j+1})-x(t_j)$ and summing over $j=1,\ldots,N-1$ yields the identity you are looking for. (Mind that $\theta = \theta(j)$; we cannot expect to find one $\theta$ which works for all $j=1,\ldots,N-1$).