Stochastic Calculus – Definitions of the Stratonovich Integral and Why the Average Definition is Correct

brownian motionprobabilitystochastic-calculusstochastic-integralsstochastic-processes

Notations: Herein:

  • $\mathcal{B} := \{B(t)\}_{t \ge 0}$ denotes a standard Brownian motion, with $B(0) = 0$.
  • $P := \{x_i\}_{i=0}^n$ denotes a partition of the interval $[0,t]$, with norm defined in the Riemannian sense.
  • $\Delta B_i := B(x_i) – B(x_{i-1})$.
  • $\Delta x_i := x_i – x_{i-1}$.
  • $\mathcal{X} := \{X(t)\}_{t \ge 0}$ denotes a Stratonovich-integrable process, in whatever sense that is needed at the time.
  • $\int_0^t X(s) \circ \mathrm{d} B(s)$ denotes the Stratonovich integral.

The Conflicting Definitions: There are two conflicting definitions for the Stratonovich integral, which to my understanding are stated below:

$$\begin{align*}
\int_0^t X(s) \circ \mathrm{d}B(s) &:= \lim_{\|P\| \to 0} \sum_{i=1}^n \frac{X(x_i) + X(x_{i-1})}{2} \Delta B_i \tag{1} \\
\int_0^t X(s) \circ \mathrm{d}B(s) &:= \lim_{\|P\| \to 0} \sum_{i=1}^n X \left( \frac{x_i + x_{i-1}}{2} \right) \Delta B_i \tag{2}
\end{align*}$$


The First Definition: Definition $(1)$ seems to be motivated by averaging $X(t)$ over each interval induced by $P$. In fact we could have a "more general" integral by considering, for $\lambda \in [0,1]$,

$$\lim_{\|P\| \to 0} \sum_{i=1}^n \Big( (1-\lambda) X(x_i) + \lambda X(x_{i-1}) \Big)\Delta B_i \tag{1'}$$

where Itô integration arises from $\lambda = 0$, as an example, and Stratonovich (in the sense of $(1)$) under $\lambda=1/2$.

In my reading, I've seen this used by

  • The Wikipedia article on Stratonovich integrals (link)

  • Apparently this is used in Ioannis Karatzas & Steven Shreve's Brownian Motion and Stochastic Calculus (Amazon link)

  • The Encyclopedia of Math website (link)

  • An article by Jonathan Mattingly on The Probability Workbook (link)


The Second Definition: Definition $(2)$ seems to be inspired simply by the Riemann-Stieltjes formulation for deterministic functions:

$$\int_0^t f(x) \, \mathrm{d} \varphi(x) = \lim_{\|P\| \to 0} \sum_{i=1}^n f(\xi_i) \Delta \varphi_i \tag{2'}$$

(for $\Delta \varphi_i$ defined similarly as for $\Delta B_i$). In this case, $\xi_i \in [x_{i-1},x_i]$. This second definition of the Stratonovich integral seems to be inspired similarly: take $\xi_i$ to be the midpoints, $\varphi$ your Brownian motion, and $f$ comes from your stochastic process.

In my reading, I've seen this definition used by:

  • Bernt Øksendal in Stochastic Differential Equations: An Introduction with Applications (Amazon link)

  • Dr. Peyam on YouTube (video link)

  • Apparently, this arises in Steven Shreve's Stochastic Calculus for Finance (Amazon link)

  • Lewis Smith on this webpage


My Question: It does not seem obvious to me that these would be equivalent definitions. Moreover, I've several times seen on Math Stack Exchange (e.g. here) the claim that $(1)$ is the "correct" definition, though seeing it used elsewhere (e.g. this Math Overflow post) no one objects (openly) to $(2)$.

Hence, I'm seeking a proper, definitive answer, because I am very confused:

  • Which is "correct" to call the Stratonovich integral? Is it simply a matter of preference?
  • Is there a particular reason to prefer one over the other if there is no definitive answer?
  • Do any results for one definition break under the other? (Such as: does the conversion to an Itô integral break? What about properties like the chain rule?)

…or am I just totally missing something here?

Best Answer

Indeed both definitions are equivalent (see for instance the Theorem V-5.30 of Protter's book).

For the sake of simplicity I'll assume that $X(t)=W(t)$. Let $0=t_0<t_1<\cdots <t_N=T$ be an arbitrary partition of the interval $[0,T]$ with $\|\pi\|:=\max_{i} |t_{i+1}-t_i|$.

Define $t^*_i:=\frac{t_{i+1}+t_i}{2}$ and consider

\begin{align*} &\sum_{i=0}^{N-1}W\left(t_i^*\right)[W(t_{i+1})-W(t_i)]\\ &=\sum_{i=0}^{N-1}[W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]+\sum_{i=0}^{N-1} W(t_i)[W(t_{i+1})-W(t_i)]=\mathcal I_1+\mathcal I_2 \end{align*} We know that as $\|\pi\|\to 0$ the term $\mathcal I_2$ converges to $\int_0^T W(t)dW(t)$ in $L^2(\Omega)$. In order to prove the desired result it suffices to show that $\mathcal I_1$ converges in $L^2(\Omega)$ to $T/2$.

We start by noticing that

\begin{align*} \mathbb E\left[\mathcal I_1\right]=\sum_{i=0}^{N-1}\mathbb E\left([W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]\right)&=\sum_{i=0}^{N-1} t_i^*\wedge t_{i+1}-t_i\wedge t_{i+1}-t_i^*\wedge t_{i}+t_i\\ &=\sum_{i=0}^{N-1}t_i^*-t_i=\sum_{i=0}^{N-1}\frac{t_{i+1}-t_i}{2}=T/2 \end{align*} Then \begin{align*} \|\mathcal I_1-T/2\|_{L^2(\Omega)}^2= \mathbb V\left(\mathcal I_1\right)=\mathbb V\left(\sum_{i=0}^{N-1}[W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]\right), \end{align*} due to the disjointness of the intervals in each term of the sum we can write the latter as \begin{align*} \|\mathcal I_1-T/2\|_{L^2(\Omega)}^2= \mathbb V\left(\mathcal I_1\right)&=\sum_{i=0}^{N-1}\mathbb V\left([W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]\right)\\ &=\sum_{i=0}^{N-1}\mathbb V\left([W(t_i^*)-W(t_i)][(W(t_{i+1})-W(t_i^*))+(W(t_i^*)-W(t_i))]\right) \end{align*}

Let $\Delta_*(i):=[W(t^*_i)-W(t_i)]$ and $\Delta^*(i):=[W(t_{i+1})-W(t^*_i)]$ \begin{align*} &\sum_{i=0}^{N-1}\mathbb V\left(\Delta_*(i)[\Delta^*(i)+\Delta_*(i)]\right)\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2[\Delta^*(i)+\Delta_*(i)]^2\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2[\Delta^*(i)^2+2\Delta^*(i)\Delta_*(i)+\Delta_*(i)^2]\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2\Delta^*(i)^2\right)+2E\left(\Delta^*(i)\Delta_*(i)^3\right)+\mathbb E\left(\Delta_*(i)^4\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2\Delta^*(i)^2\right)+\mathbb E\left(\Delta_*(i)^4\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1} (t_i^*-t_i)(t_{i+1}-t_i^*)+3(t_i^*-t_i)^2- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1} (t_i^*-t_i)(t_{i+1}-t_i^*)+2(t_i^*-t_i)^2 \end{align*}

Now notice that \begin{align*} (t_{i+1}-t_i^*)(t_i^*-t_i)= \left(t_{i+1}-\frac{t_i+t_{i+1}}{2}\right)\left(\frac{t_i+t_{i+1}}{2}-t_i\right)=\frac{(t_{i+1}-t_i)^2}{4}, \end{align*} and \begin{align*} (t_i^*-t_i)^2=\frac{(t_{i+1}-t_i)^2}{4} \end{align*}

and thus the latter equals

\begin{align*} \frac{3}{4}\sum_{i=0}^{N-1} (t_{i+1}-t_i)^2\leq \frac{3}{4}\|\pi\|\sum_{i=0}^{N-1} (t_{i+1}-t_i)=\frac{3}{4}\|\pi\|T \end{align*} and the last term on the right vanished as $\|\pi\|\to 0$.

An interesting property is that if we replace the standard product $"\times"$ in $$\sum_{i=0}^{N-1}W\left(t_i^*\right)\times [W(t_{i+1})-W(t_i)],$$ with the so-called Wick product "$\diamond$", then the choice of the evaluation point is irrelevant in fact $$\sum_{i=0}^{N-1}W\left(t_i^{\alpha}\right)\diamond [W(t_{i+1})-W(t_i)]\to \int_0^T W(t)dW(t)$$ where $t_i^{\alpha}:=[1-\alpha]t_{i}+\alpha t_{i+1}$ for any choice of $\alpha\in [0,1]$.

This is due to the fact that the Wick product is somehow implicit in the Itô integration via the formula

$$\int_0^T f(W(t))dW(t)=\int_0^T f(W(t))\diamond \dot W(t)dt$$ where $\dot W(t)$ denotes the distributional derivative of the Brownian motion (i.e. a white noise process).