unfortunately, I seem to be quite unable to come up with the correct umbral calculus proof of the identity
$$
\frac{\mathrm{d}^{n}\left(fg\right)}{\mathrm{d}x^{n}}\left(x\right) = \sum_{k=0}^{n}{\binom{n}{k} f^{\left(k\right)}\left(x\right) g^{\left(n-k\right)}\left(x\right)}.
$$
I tried to write $\frac{\mathrm{d}}{\mathrm{d}x}$ as an element of a ring in which $f$ and $g$ might be idempotents, but the problem is that we are multiplying and not adding $f$ and $g$. I then tried and failed to find the proof on the internet. I'd be enourmously grateful for any courteous hints.
The umbral calculus proof of the higher order product rule
calculuscombinatoricsderivativesumbral-calculus
Related Solutions
Umbral relations shadow those of the basic binomial transform, revealing underlying connections between diverse areas of math (as Leibnitz himself predicted--see H. Davis "Theory of Linear Operators"):
[Edit June 17, 2021: A slightly different introductory presentatation is at MO here.]
I) Umbral notation is brief and suggestive (courtesy of Blissard and contemporaries):
$ \displaystyle (a.)^n= a_n \; \; \; $ (umbral variable and lowering of superscript).
Expressing binomial convolution simply:
$$ \displaystyle (a. + b.)^{n} = \sum_{k=0}^{n} \binom{n}{k} a_{k} b_{n-k} \; \;$$ (be careful to evaluate $(a.+b.)^0=a_0b_0$ and $(a.+b.)^1=a_0b_1+a_1b_0$), $$ \displaystyle e^{a. \; x}= \sum_{n \ge 0} a_n \frac{x^n}{n!} \; \; ,$$
$$ \displaystyle e^{a.\;x}\; e^{b.\; x} = e^{(a. + b.)x}\; \; .$$
A more precise notation is to use $\langle a.^n \rangle = a_n$ to clearly specify when the lowering op, or evaluation of an umbral quantity, is to be done. E.g.,
$$\langle a.^n a.^m\rangle=\langle a.^{n+m}\rangle= a_{n+m} \ne a_n a_m= \langle a.^n\rangle\langle a.^m\rangle$$
and $$\langle\exp[\ln(1+a.x)]\rangle=\langle(1+a.x)\rangle=1+a_1x$$
$$\ne \exp[\langle\ln(1+a.x)\rangle]=\exp \left[ \sum_{n \ge 1} \langle\frac{a.^nx^n}{n}\rangle\right]=\exp\left[\sum_{n \ge 1} \frac{a_nx^n}{n}\right]\; .$$
II) Same for umbralized ops, allowing succint specification and derivation of many relations, especially among special functions. A good deal of umbral calculus is about defining these ops for special sequences, such as the falling $(x)_{n}=x!/(x-n)!$ and rising factorials $(x)_{\bar{n}}=(x+n-1)!/(x-1)!$ and Bell polynomials $\phi_n(x)$.
Examples:
$ (:AB:)^n = A^n B ^n$ (defn. for order preserving exponentiation for any operators )
$$ (xD)^n = (\phi.(:xD:))^n = \phi_n(:xD:) \; \; ,$$
$$ e^{txD} = e^{t \phi.(:xD:)} \; \; .$$
From $xD \; x^{n} = n \; x^{n}$, it's easy to derive
$$e^{t\phi.(x)} = e^{x (e^t-1)} \; \; .$$
(See this MO-Q for the o.g.f.)
III) Umbral compositional inverse pairs allow for easy derivations of combinatorial identities and reveal associations among different reps of operator calculi.
Look at how this connects the distributive operator exponentiation $:xD:^n=x^nD^n$ to umbral lowering of superscripts. The falling factorials and Bell polynomials are an umbral inverse pair, i.e., $\phi_n((x).)=x^n=(\phi.(x))_n$. This is reflected in the functions $\log(1+t)$ and $e^t-1$, defining their e.g.f.s $e^{x\log(1+t)}$ and $e^{x(e^t-1)}$, being regular compositional inverses and to the lower triangular matrices containing the coefficients for the polynomials (the Stirling numbers of the first and second kinds) being multiplicative inverses, so we can move among many reps to find and relate many formulas. For the derivative op rep,
$$((xD).)^n=(xD)_n=x^nD^n=:xD:^n=(\phi.(:xD:)).^n=(\phi.(:xD:))_n,$$
so we have a connection to the umbral lowering of indices
$$:xD:^n=((xD).)^n=(xD)_n=x^nD^n.$$
IV) The generalized Taylor series or shift operator is at the heart of umbral calculus:
$$ e^{p.(x)D_y}f(y) = f(p.(x)+y) \; , $$
(e.g., this entry on A class of differential operators and another on the Bernoulli polynomials) with special cases
$$ e^{:p.(x) D_x:} f(x) = f(p.(x) + x) \; , $$ and
$$ e^{-(1-q.(x))D_y}y^{s-1} \; |_{y=1} = (1-(1-q.(x)))^{s-1} \; ,$$ giving a Gauss-Newton interpolation of $q_n(x)$ (shadows of the binomial relations).
It can often be used to easily reveal interesting combinatorial relations among operators. A simple example:
$$ e^{txD} f(x) = e^{t\phi.(:xD:)} f(x) = e^{(e^t-1):xD:} f(x) = f(e^{t}x) \; .$$ You could even umbralize $t$ to obtain the Faa di Bruno formula. Try discovering some op relations with the Laguerre polynomials (hint--look at $:Dx:^n= D^nx^n$. Cf. Diff ops and confluent hypergeometric fcts.).
As another example (added May 2015) of the interplay between differential operators, umbral calculus, and finite differences, note the relations for the Bell polynomials
$$\phi_{n}(:xD_x:)= \sum_{k=0}^n S(n,k)x^kD_x^k = (xD_x)^n=\sum_{j=0}^\infty j^n \frac{x^jD^j_{x=0}}{j!}=\sum_{j=0}^\infty (-1)^j \left[\sum_{k=0}^j(-1)^k \binom{j}{k}k^n\right] \frac{x^jD_x^j}{j!} \;$$
and apply these operators on $x^m$, $e^x$, and $x^s$. (The $S(n,k)$ are the Stirling numbers of the second kind.)
I've used the power monomials $x^n$ and their associated raising and lowering ops, $x$ and $D_x$, but these relations are shadowed by the raising and lowering ops of all umbral sequences $p_n(x)$ such that $R \; p_n(x) = p_{n+1}(x)$ and $L \; p_n(x) = n \; p_{n-1}(x)$. (Shadows of Lie and quantum mechanics here also.)
V) (Added Sept. 2020): The generalized Chu-Vandermonde identitiy for the discrete convolution of binomial coefficients--integral to understanding properties of confluent hypergeomeric functions and their diff op reps--is easily derived from the umbral Sheffer calculus.
Binomial Sheffer sequence of polynomials (BSP) have e.g.f.s of the form
$$e^{x \; h(t)} = e^{t \; B.(x)},$$
where $h(t)$ and is invertible and vanishes at the origin. This implies
$$e^{(x+y)h(t)} = e^{t \; B.(x+y)} = e^{xh(t)}e^{yh(t)} = e^{t \; B.(x)}e^{t B.(y)} = e^{t(B.(x)+B.(y))},$$ so follows the accumulation property $$(B.(x)+B.(y))^n = B_n(x+y).$$ The Stirling polynomials of the first kind, $ST1_n(x) = (x)_n$, are a BSP with $h(t)=\ln(1+t)$ and
$$\binom{x}{k} = \frac{ST1_k(x)}{k!},$$ so $$(ST1.(x)+ST1.(y))^n = ST1_n(x+y)$$
implies directly the Chu-Vandermonde identity
$$\binom{x+y}{n} = \sum_{k=0}^n \binom{x}{k} \binom{y}{n-k}.$$
It’s good practice working with summations.
Let $F_1(x)=\sum_{n\ge 0}a_nx^n$, $F_2(x)=\sum_{n\ge 0}b_nx^n$, and $F_3(x)=F_1(x)F_2(x)=\sum_{n\ge 0}c_nx^n$, where we know that
$$c_n=\sum_{k=0}^na_kb_{n-k}\;.$$
Now
$$\begin{align*} F_1(x)F_2'(x)&=\sum_{n\ge 0}a_nx^n\sum_{n\ge 0}nb_nx^{n-1}\\ &=\sum_{n\ge 0}a_nx^n\sum_{n\ge 0}(n+1)b_{n+1}x^n\\ &=\sum_{n\ge 0}\sum_{k=0}^na_k(n-k+1)b_{n-k+1}x^n \end{align*}$$
and
$$\begin{align*} F_1'(x)F_2(x)&=\sum_{n\ge 0}na_nx^{n-1}\sum_{n\ge 0}b_nx^n\\ &=\sum_{n\ge 0}(n+1)a_{n+1}x^n\sum_{n\ge 0}b_nx^n\\ &=\sum_{n\ge 0}\sum_{k=0}^n(k+1)a_{k+1}b_{n-k}x^n\;, \end{align*}$$
so the coefficient of $x^n$ in $F_1(x)F_2'(x)+F_1'(x)F_2(x)$ is
$$\begin{align*} &\sum_{k=0}^n(n-k+1)a_kb_{n-k+1}+\sum_{k=0}^n(k+1)a_{k+1}b_{n-k}\\\\ &\quad=(n+1)a_0b_{n+1}+\sum_{k=1}^n(n-k+1)a_kb_{n-k+1}+\sum_{k=0}^{n-1}(k+1)a_{k+1}b_{n-k}+(n+1)a_{n+1}b_0\\\\ &\quad=(n+1)a_0b_{n+1}+\sum_{k=1}^n(n-k+1)a_kb_{n-k+1}+\sum_{k=1}^nka_kb_{n-k+1}+(n+1)a_{n+1}b_0\\\\ &\quad=(n+1)a_0b_{n+1}+\sum_{k=1}^n(n+1)a_kb_{n-k+1}+(n+1)a_{n+1}b_0\\\\ &\quad=(n+1)\sum_{k=0}^{n+1}a_kb_{n+1-k}\\\\ &\quad=(n+1)c_{n+1}\;, \end{align*}$$
which is of course the coefficient of $x^n$ in $$F_3'(x)=\sum_{n\ge 0}(n+1)c_{n+1}x^n\;.$$
Best Answer
The trick is to make a little detour by bivariate calculus.
Let $F(x, y)$ be a bivariate function, and recall the chain rule
$$ D_t F(x(t),y(t)) = \dfrac{\text{d} F}{\text{d} a_1} (x,y) D_t x(t) + \dfrac{\text{d}F}{\text{d} a_2}(x, y) D_t y(t) $$
where $\dfrac{\text{d} F}{\text{d} a_i}(x, y)$ is the derivative of $F$ with respect to its $i$-th variable, evaluated at $x$ and $y$. (I want to avoid any confusion regarding the variables).
This implies that
$$ D_x F(x, x) = \dfrac{\text{d} F}{\text{d} a_1} (x,x)+ \dfrac{\text{d}F}{\text{d} a_2}(x, x). \tag{1} $$
If I define a linear operator by $\mathcal E_{y\to x} F(x, y) = F(x, x)$, then in operator fashion, (1) becomes
$$ D_x \mathcal E_{y\to x} = \mathcal E_{y\to x} (D_x + D_y), \tag{2} $$
which is true for all (totally) differentiable functions. By applying $D_x$ on the left and using (2), we get
$$ D_x^2 \mathcal E_{y\to x} = D_x \mathcal E_{y\to x} (D_x + D_y) = \mathcal E_{y\to x} (D_x + D_y)^2, $$
which can be generalized by induction for integers $n$ to
$$ D_x^n \mathcal E_{y\to x} = \mathcal E_{y\to x} (D_x + D_y)^n. \tag{3} $$
Finally we can apply this equality (3) to $f(x)g(y)$ to get on the left hand side
$$ D_x^n \mathcal E_{y\to x} f(x) g(y) = D_x^n f(x)g(x), $$
and on the right hand side
$$ \begin{align} \mathcal E_{y\to x} (D_x + D_y)^n f(x)g(y) &= \mathcal E_{y\to x} \sum_{k=0}^n \binom{n}{k} D_x^k D_y^{n-k} f(x) g(y) \\ &= \mathcal E_{y\to x} \sum_{k=0}^n \binom{n}{k} f^{(k)}(x) g^{(n-k)}(y) \\ &= \sum_{k=0}^n \binom{n}{k} f^{(k)}(x) g^{(n-k)}(x), \end{align} $$
where we have used the binomial theorem, which works because $D_x$ and $D_y$ commute when applied to the space generated by products of the form $l(x)m(y)$ by the linearity of the derivative. Hence the equality
$$ D_x^n f(x)g(x) = \sum_{k=0}^n \binom{n}{k} f^{(k)}(x) g^{(n-k)}(x). $$
Note: I'd like to mention that this proof is not an "Umbral Calculus" proof, but more of an "Operational Calculus" one, as Umbral Calculus is the study of Sheffer sequences. But both are strongly connected.