Fokker-Planck equation for Ito processes: link with Ito’s lemma

intuitionreference-requeststochastic-calculusstochastic-differential-equationsstochastic-processes

I was trying to find an "intuitive", or "natural", derivation of the Fokker-Planck (FP) equation: I am not seeking mathematical rigour, but rather some intuitive argument.

Given the Ito's process with multiplicative noise ($W_t$ is the usual Wiener process, aka "Brownian motion")
$$
dX_t = \mu(X_t,t) \, dt + \sigma(X_t,t) \, dW_t,
$$

its associated FP equation for the PDF $p$ is
$$
\frac{\partial}{\partial t}p(x,t) = -\frac{\partial}{\partial x}[\mu(x,t)p(x,t)]+\frac{1}{2}\frac{\partial^2}{\partial x^2}[\sigma(x,t)^2 p(x,t)] .
$$

Question: How to derive the FP above in a "quick-and-dirty" way? Is there some smart argument according to which we may say "yes, it is obvious that the FP must have that form, including that 1/2 factor in the diffusion term"?

My considerations: Three things come to my mind, on the basis that we can think of $p$ as a sort of density of particles (after all $p$ comes from an average procedure, see point 3):

1 – If $\sigma=0$ and $\mu$ is constant, then the PDF $p$ can not vary along the deterministic trajectories that are solutions of $dX(t)/dt = \mu$, meaning that $p$ is just "advected":
$$
0=\frac{d}{dt} p(X(t),t) = \frac{\partial}{\partial t}p(X(t),t) + \frac{\partial}{\partial X}p(X(t),t) \frac{dX(t)}{dt} = \frac{\partial}{\partial t}p(X(t),t) + \mu\frac{\partial}{\partial X}p(X(t),t)
$$

This is because, for constant $\mu$, particles following the same path $X$ do not vary their speed, maintaining constant mutual distances (measured along the path), hence constant density along their trajectory. This is consistent with the FP.

2 – If $\sigma=0$ and $\mu\neq cost$, then we have the usual "continuity equation", that generalizes the above "advection" equation. Particles starting with the same initial condition at different times will follow the same path, but their mutual distances (measured along their trajectory) are not constant anymore: there is compression and rarefaction along the trajectories, so that $p$ is not simply advected:
$$
\frac{d}{dt} p(X_t,t) = \frac{\partial}{\partial t}p(X_t,t) + \mu(X_t,t) \frac{\partial}{\partial X_t}p(X_t,t) =\text{compression or rarefaction} = -p(X_t,t)\frac{\partial}{\partial X_t}\mu(X_t,t)
$$

The RHS term describes this "hydrodynamic" behaviour: if $\mu$ is increasing along the trajectory then we have a rarefaction ($p$ decreases, hence the minus sign in the RHS). In other words, $p$ satisfies the "continuity" equation, meaning that it is conserved (we are not destroying or adding particles).

3 – I am not 100% sure about the derivation that follows, but here is what I figured out: it seems to me that the FP is just a consequence of the Ito's lemma. I am not able to find this reasoning anywhere, so if you think it makes sense (or you know some reference where an argument similar to the following one is used), please let me know.

First, the PDF is $p(y,t) = E[\delta(y-X_t)]$, where $E$ is the expectation over different realizations of the process $X_t$. I will treat the Dirac delta as a legit function $f_y(X_t)\approx \delta(X_t-y)$: think of $f_y$ as a skinny Gaussian peaked in $y$, we will take the limit to obtain the Dirac delta only at the end.

Ito's lemma tells us that ($\mu$ and $\sigma$ may not be constant, but I don't write their arguments to save space)
$$
df_y(X_t) = [\mu \partial_{X_t} f_y(X_t) +\frac{\sigma^2}{2} \partial^2_{X_t} f_y(X_t)]dt+[\sigma \partial_{X_t} f_y(X_t) ]dW_t \, .
$$

Take the expectation $E$:
$$
E[df_y(X_t)] = dE[f_y(X_t)]= E[\mu \partial_{X_t} f_y(X_t)] dt +E\left[ \frac{\sigma^2}{2} \partial^2_{X_t} f_y(X_t) \right] dt\, .
$$

Since $df_y(X_t) = f_y(X_{t+dt})-f_y(X_{t})$, while the value $y$ is held constant, I would say that $dE[f_y]/dt \rightarrow \partial_t p(y,t)$. Moreover, $\partial_{X_t} f_y(X_t) = -\partial_{y} f_y(X_t)$, meaning that
$$
E[\partial_{X_t} f_y(X_t)] = E[-\partial_{y} f_y(X_t)] =-\partial_{y} E[f_y(X_t)] \rightarrow -\partial_{y} p(y,t)
$$

Similarly, $E[\partial_{X_t}^2 f_y(X_t)] = \partial_{y}^2 E[f_y(X_t)]$. We have (I use $\approx$ because in the LHS I write $p$ instead of $f_y$, just to remember the "meaning" of the LHS):
$$
\partial_t p(y,t) \approx -E[\mu(X_t,t) \partial_{y} f_y(X_t)] +E\left[ \frac{\sigma^2(X_t,t)}{2} \partial^2_{y} f_y(X_t) \right] \,=…
\\
…=
-\partial_{y} E[\mu(X_t,t) f_y(X_t)] +
\partial^2_{y}E \left[ \frac{\sigma^2(X_t,t)}{2} f_y(X_t) \right]
$$

Now, if (I am quite sure the following is true, but correct me in case)
$$
E[\mu(X_t,t) f_y(X_t)] \rightarrow \mu(y,t) p(y,t)
\\
E \left[ \sigma^2(X_t,t) f_y(X_t) \right]
\rightarrow \sigma^2(y,t) p(y,t)
$$

we obtain
$$
\partial_t p(y,t) =
-\partial_{y} [\mu(y,t) p(y,t)] +
\partial^2_{y}\left[ \frac{\sigma^2(y,t)}{2} p(y,t) \right]
$$

that is the Ito's form of the FP equation. I know that several things may go wrong in this derivation, and that there are more rigorous methods. However, it seems to me that this line of reasoning really builds on the centrality of Ito's lemma. I wonder if, using Stratonovic calculus and following the same passages, one obtains the Stratonovic representation of the FP.

Best Answer

Here is a very dirty and quick derivation. All feedback appreciated. From Ito we can derive the KBE $$\frac{\partial F}{\partial t}+\mu(x,t)\frac{\partial F}{\partial x}(x,t)+\frac{1}{2}\sigma^2(x,t)\frac{\partial^2 F}{\partial x^2}(x,t)=0$$ for $t \in [0,T]$. From this we get $$\int_\mathbb{R} \frac{\partial F(y,T)}{\partial T}p(y,T;x,t)dy=\\ =(-1)\int_\mathbb{R} \bigg(\mu(y,T)\frac{\partial F}{\partial y}(y,T)+\frac{1}{2}\sigma^2(y,T)\frac{\partial^2 F}{\partial y^2}(y,T)\bigg)p(y,T;x,t)dy$$ By assuming sufficiently strong decay conditions at $+\infty$ and $-\infty$, integration by parts yields just $$\int_\mathbb{R} \frac{\partial F}{\partial y}(y,T)\mu(y,T)p(y,T;x,t)dy=\int_\mathbb{R}F(y,T)(-1)\frac{\partial}{\partial y}(\mu(y,T)p(y,T;x,t))dy$$ $$\int_\mathbb{R} \frac{\partial^2 F}{\partial y^2}(y,T)\sigma^2(y,T)p(y,T;x,t)dy=\int_\mathbb{R}F(y,T)\frac{\partial^2}{\partial y^2}(\sigma^2(y,T)p(y,T;x,t))dy$$ So we end up with $$\frac{\partial }{\partial T}E[F(X_T,T)|X_t=x]=\int_{\mathbb{R}}F(y,T)\bigg(\frac{\partial p}{\partial T}+\frac{\partial}{\partial y}(\mu p)-\frac{1}{2}\frac{\partial^2}{\partial y^2}(\sigma^2p)\bigg)dy$$ If $(F(X_t,t))_{t \in [0,T]}$ is a martingale, we want that derivative wrt $T$ to be $0$. So we set the PDE in the parenthesis to $0$: $$\frac{\partial p}{\partial T}=-\frac{\partial}{\partial y}(\mu p)+\frac{1}{2}\frac{\partial^2}{\partial y^2}(\sigma^2p)$$ Indeed this is the Fokker-Planck equation for $T >t$.


To answer to your question about point (3): if we assume that we are alowed to choose $F(x,t)=\delta(x-y)$ then $$F(X_t,t)=\delta(X_t-y)\implies E[F(X_t,t)|X_0=x_0]=p(y,t;x_0,0)$$ $$dF(X_t,t)=\bigg(\delta_x(X_t-y)\mu(X_t,t)+\frac{1}{2}\delta_{xx}(X_t-y)\sigma^2(X_t,t)\bigg)dt +\delta_x(X_t-y)\sigma(X_t,t)dW_t$$ The problem here is that $dF$ would be shorthand for the integral $$\delta(X_t-y)-\delta(x_0-y)=\int_{[0,t]}\bigg(\delta_x(X_s-y)\mu(X_s,s)+\frac{1}{2}\delta_{xx}(X_s-y)\sigma^2(X_s,s)\bigg)ds+\\ +\int_{[0,t]}\delta_x(X_s-y)\sigma(X_s,s)dW_s$$ and I tried, but struggled to make sense of this and obtain something like the FPE without bending the rules too much. So, as an initial assessment, I think that using the Dirac delta is too far fetched to be an acceptable 'dirty' argument.

Related Question