I recently studied similar questions. In what follows, I try to give a summery of the information I gathered. I hope this is somewhat helpful.
2.) I try to answer your second question first. I also find it hard to find Information about this topic and the reason for this is in my opinion, that there are no deep theorems to prove about Duhamel's Principle. To the best of my knowledge, Duhamel's Principle is more an idea and it has to be applied to every case specifically using other already well-known tools. I hope the following makes it more clear.
1.) The thing to keep in mind concerning your first question is that Duhamel's Principle is used to construct solutions for non-homogeneous linear evolution equations. In your case the term $f(u)$ makes the equation in general non-linear, thus, you can't use Duhamel's principle to construct a solution. However, assume you already have a solution $u$ satisfying your equation then $u$ satisfies the representation formula that you describe in 3. In the following I will derive this representation formula. First, we have to know what is meant by $u(\cdot,0) = u_0(\cdot)$ in the case where $u_0$ is a measure. $u_0$ is a measure, so we can't understand this equation in a point-wise sense. This is why one usually interprets this equation in a distribution sense, i.e., we require that for all $\phi \in C_0^\infty(\mathbb{R})$, i.e. for all $\phi$ that are smooth and vanish at infinity, the following $$\lim_{t\rightarrow 0} \int u(x,t) \phi(x) \, \mathrm{d}x = \int \phi(x) \, \mathrm{d}u_0(x). $$
Note that in particular this is true for $\phi(x) = K(x,s)$ for any $s>0$. Now assume $u$ is a $C^{2,1} \cap L^\infty$ solution for your problem and it satisfies the boundary condition in a distributional sense. First, we look at the following:
$$\mathrm{d}/\mathrm{d}s \int K(x-y,t-s) u(y,s) \, \mathrm{d}y = \int K(x-y,t-s)(u_t- u'')(y,s) \, \mathrm{d}y,$$
where we interchanged the limit and derivative (by dominated convergence), applied integration by parts, and used that $K_t - K'' = 0$ for $t>0$. Next, we multiply $u_t -u'' = f(u)$ with $K(x,t-s)$ and we integrate. We get the following
\begin{align}
&\int_\epsilon^{t-\epsilon} \int_{\mathbb{R}} K(x-y,t-s) (u_t(y,s)-u''(y,s)) \, \mathrm{d}y\mathrm{d}s \\ = &\int_\epsilon^{t-\epsilon}\int_{\mathbb{R}} K(x-y,t-s) f(u(y,s)) \, \mathrm{d}y\mathrm{d}s \\ = &\int_{\mathbb{R}} K(x-y,\epsilon)u(y,t -\epsilon) \, \mathrm{d}y - \int_{\mathbb{R}} K(x-y,t-\epsilon)u(y,\epsilon) \, \mathrm{d}y,
\end{align}
where we used the equation derived above and the fundamental theorem of calculus. Letting $\epsilon \rightarrow 0$ in the last equation, yields $$ u(x,t) = \int_\mathbb{R} K(x-y,t) \, \mathrm{d}u_0(y) + \int_0^t \int_\mathbb{R} K(x-y,t-s) f(u(y,s)) \, \mathrm{d}y \mathrm{d}s,$$ where we used the initial value condition in a distributional sense and that $\lim_{t\rightarrow 0} \int K(x-y,t) g(y) \, \mathrm{d}y = g(x)$ for a continuous bounded function $g$.
3.) Let's illustrate this point with an example. Assume you have a non-negative measurable function $U$ satisfying the integral equation above, i.e.,
$$ U(x,t) = \int_\mathbb{R} K(x-y,t) \, \mathrm{d}u_0(y) + \int_0^t \int_\mathbb{R} K(x-y,t-s) f(U(y,s)) \, \mathrm{d}y \mathrm{d}s,$$
but we don't suppose any regularity on $U$ in the sense of differentiability. If you now can additionally show that $U \in L^\infty_{loc}(0,T;L^\infty(\mathbb{R}))$, then you can for instance deduce that $U$ is hölder continuous and therefore, using parabolic Schauder estimates you get that $U\in C^{2,1}$ is actually a regular solution. So basically if you can find a positive measurable function $U$ that satisfies the integral equation and you can show that the integral converges, then you can bootstrap to get more regularity, thus a classical solution.
Best Answer
I am not sure if this is your point, but the path integral you mentioned is, at least to me, as if it refers to stochastic differential equations (SDE) and the Fokker-Planck equation.
To help explain the mechanism behind the Fokker-Planck equation, let us start from a naïve example. For the heat equation in 1-D Euclidean space $\mathbb{R}$, consider the following Brownian motion $$ {\rm d}X_t=\sigma\,{\rm d}W_t, $$ where the constant $\sigma$ denotes the thermal diffusivity, the stochastic process $W_t$ means the Wiener process. Here $X_t$ is also a stochastic process, tracing the position of a Brownian particle at time $t$. Further, denote by $u(t,x)$ the probability density function (PDF) of $X_t$, i.e., $$ \int_{-\infty}^xu(t,y)\,{\rm d}y=\mathbb{P}(X_t\le x). $$ We hope to find a differential equation that governs $u=u(t,x)$.
Define $$ f(x)=e^{-2\pi i\xi x}. $$ Then $$ \mathbb{E}f(X_t)=\int_{\mathbb{R}}e^{-2\pi i\xi x}u(t,x)\,{\rm d}x=\hat{u}(t,\xi) $$ is the Fourier transform of the PDF. This is known as the characteristic function in probability. To find a governing equation for $u$, it suffices to find that for $\hat{u}$.
By Ito's formula, we have \begin{align} {\rm d}f(X_t)&=f'(X_t)\,{\rm d}X_t+\frac{1}{2}f''(X_t)\,{\rm d}\left<X\right>_t\\ &=-2\pi^2\xi^2\sigma^2f(X_t)\,{\rm d}t-2\pi i\xi\sigma f(X_t)\,{\rm d}W_t, \end{align} where $\left<X\right>_t$ denotes the quadratic variation process of $X_t$, which, for the SDE stated above, reads $$ {\rm d}\left<X\right>_t=\sigma^2\,{\rm d}t. $$ From this result, we obtain $$ f(X_t)=f(X_0)-2\pi^2\xi^2\sigma^2\int_0^tf(X_s)\,{\rm d}s-2\pi i\xi\sigma\int_0^tf(X_s)\,{\rm d}W_s. $$ Note that the last integral is a martingle (see here as well), for which $$ \mathbb{E}\left(\int_0^tf(X_s)\,{\rm d}W_s\right)=0. $$ Thanks to this result, taking the expectation on both sides yields \begin{align} \mathbb{E}f(X_t)&=\mathbb{E}f(X_0)-2\pi^2\xi^2\sigma^2\mathbb{E}\left(\int_0^tf(X_s)\,{\rm d}s\right)\\ &=\mathbb{E}f(X_0)-2\pi^2\xi^2\sigma^2\int_0^t\mathbb{E}f(X_s)\,{\rm d}s, \end{align} or using $\mathbb{E}f(X_t)=\hat{u}(t,\xi)$, $$ \hat{u}(t,\xi)=\hat{u}(0,\xi)-2\pi^2\xi^2\sigma^2\int_0^t\hat{u}(s,\xi)\,{\rm d}s, $$ which is equivalent to the differential form $$ \frac{\rm d}{{\rm d}t}\hat{u}(t,\xi)=-2\pi^2\xi^2\sigma^2\hat{u}(t,\xi), $$ whose inverse Fourier transform gives $$ \frac{\partial}{\partial t}u(t,x)=\frac{\sigma^2}{2}\frac{\partial^2}{\partial x^2}u(t,x). $$
To sum up, the solution of a heat equation can be interpreted as the probability density of some stochastic process.
This concept can be generalized to the general Ito process, a special type of stochastic processes governed by $$ {\rm d}X_t=\mu(t,X_t)\,{\rm d}t+\sigma(t,X_t)\,{\rm d}W_t, $$ where $\mu$ and $\sigma$ are both preloaded functions. Repeat the above derivation with further techniques such as $$ \mathbb{E}\left(f(X_t)\,\mu(t,X_t)\right)=\int_{\mathbb{R}}e^{-2\pi i\xi x}\mu(t,x)u(t,x)\,{\rm d}x=\widehat{\left(\mu\,u\right)}(t,\xi), $$ one may end up with the general 1-D Fokker-Planck equation $$ \frac{\partial}{\partial t}u(t,x)=-\frac{\partial}{\partial x}\left(\mu(t,x)\,u(t,x)\right)+\frac{1}{2}\,\frac{\partial^2}{\partial x^2}\left(\sigma^2(t,x)\,u(t,x)\right). $$ One may also generalize this 1-D equation to $n$-D cases by employing an $n$-dimensional SDE system, as is stated in this page. As far as I see, one may also generalize this Euclidean-space result to differentiable manifolds, but I am afraid it goes beyond my knowledge.
Hope this explanation could be somewhat helpful for your.