The problem with your proof is the assertion
Now $dF$ is just the small change in $F$ and $f(x)dx$ represents the
infinitesimal area bounded by the curve and the $x$ axis.
That is indeed intuitively clear, and is the essence of the idea behind the fundamental theorem of calculus. It's pretty much what Leibniz said. It may be obvious in retrospect, but it took Leibniz and Newton to realize it (though it was in the mathematical air at the time).
The problem calling that a "proof" is the use of the word "infinitesimal". Just what is an infinitesimal number? Without a formal definition, your proof isn't one.
It took mathematicians several centuries to straighten this out. One way to do that is the long proof with limits of Riemann sums you refer to. Another newer way is to make the idea of an infinitesimal number rigorous enough to justify your argument. That can be done, but it's not easy.
Edit in response to this new part of the question:
Plus we do this sort of thing in physics all the time.
Of course. We do it in mathematics too, because it can be turned into a rigorous argument if necessary. Knowing that, we don't have to write that argument every time, and can rely on our trained intuition. In fact you can safely use that intuition even if you don't personally know or understand how to formalize it.
Variations on your question come up a lot on this site. Here are some related questions and answers.
Begin with the study of polynomials in $X$ and
some linear operators. Define the shift linear operator
$$ E_h[f(X)] \!:=\! f(X\!+\!h), \tag{1} $$
and the difference linear operator
$$ \Delta_h[f(X)] := f(X\!+\!h)\!-\!f(X), \tag{2} $$
and the derivative linear operator
$$ D[f(X)] := \frac{d}{dX} f(X), \tag{3} $$
where $\,f()\,$ is any polynomial.
Define the falling factorial linear operator
for monomials
$$ L[X^n] := X(X-1)\dots(X-n+1). \tag{4} $$
Without loss of generality, define the $\,S\,$
linear operator by
$$ S[f(X)] := (X\Delta_{-1}+c\,\Delta_{1})[f(X)]. \tag{5}$$
Applying this to falling factorials gives
$$ S[(X)_n] = -n ( (X)_n - c\, (X)_{n-1}). \tag{6} $$
Rewriting this using the $\,L\,$ operator gives
$$ S[L[X^n]] = -n L[ X^{n-1} (X-c)]. \tag{7} $$
Rewrite this using the derivative operator gives
$$ S[L[f(X)]] = -L[ D[f(X)](X-c)]. \tag{8} $$
Apply this to the case $\,f(X)=E_h[X^n]\,$ to get
$$ S[L[E_h[X^n]]] = -L[D[E_h[X^n]](X-c)]. \tag{9} $$
Apply this to the case $\,h=-c\,$ to get
$$ S[L[(X-c)^n]] = -L[D[(X-c)^n](X-c)]. \tag{10} $$
But we know the derivative of $\,(X-c)^n\,$ and so get
$$ S[L[(X-c)^n]] = -n\,L[(X-c)^n]. \tag{11} $$
Define the eigenvector function
$$ \psi_n(X) \!:=\! L[(X\!-\!c)^n] \!=\! \sum_{m=0}^n
{n \choose m} (-c)^m(X)_{n-m}. \tag{12} $$
with eigenvalue $\,-n\,$ and is expanded into
a finite sum using the binomial theorem.
Best Answer
Umbral relations shadow those of the basic binomial transform, revealing underlying connections between diverse areas of math (as Leibnitz himself predicted--see H. Davis "Theory of Linear Operators"):
[Edit June 17, 2021: A slightly different introductory presentatation is at MO here.]
I) Umbral notation is brief and suggestive (courtesy of Blissard and contemporaries):
$ \displaystyle (a.)^n= a_n \; \; \; $ (umbral variable and lowering of superscript).
Expressing binomial convolution simply:
$$ \displaystyle (a. + b.)^{n} = \sum_{k=0}^{n} \binom{n}{k} a_{k} b_{n-k} \; \;$$ (be careful to evaluate $(a.+b.)^0=a_0b_0$ and $(a.+b.)^1=a_0b_1+a_1b_0$), $$ \displaystyle e^{a. \; x}= \sum_{n \ge 0} a_n \frac{x^n}{n!} \; \; ,$$
$$ \displaystyle e^{a.\;x}\; e^{b.\; x} = e^{(a. + b.)x}\; \; .$$
A more precise notation is to use $\langle a.^n \rangle = a_n$ to clearly specify when the lowering op, or evaluation of an umbral quantity, is to be done. E.g.,
$$\langle a.^n a.^m\rangle=\langle a.^{n+m}\rangle= a_{n+m} \ne a_n a_m= \langle a.^n\rangle\langle a.^m\rangle$$
and $$\langle\exp[\ln(1+a.x)]\rangle=\langle(1+a.x)\rangle=1+a_1x$$
$$\ne \exp[\langle\ln(1+a.x)\rangle]=\exp \left[ \sum_{n \ge 1} \langle\frac{a.^nx^n}{n}\rangle\right]=\exp\left[\sum_{n \ge 1} \frac{a_nx^n}{n}\right]\; .$$
II) Same for umbralized ops, allowing succint specification and derivation of many relations, especially among special functions. A good deal of umbral calculus is about defining these ops for special sequences, such as the falling $(x)_{n}=x!/(x-n)!$ and rising factorials $(x)_{\bar{n}}=(x+n-1)!/(x-1)!$ and Bell polynomials $\phi_n(x)$.
Examples:
$ (:AB:)^n = A^n B ^n$ (defn. for order preserving exponentiation for any operators )
$$ (xD)^n = (\phi.(:xD:))^n = \phi_n(:xD:) \; \; ,$$
$$ e^{txD} = e^{t \phi.(:xD:)} \; \; .$$
From $xD \; x^{n} = n \; x^{n}$, it's easy to derive
$$e^{t\phi.(x)} = e^{x (e^t-1)} \; \; .$$
(See this MO-Q for the o.g.f.)
III) Umbral compositional inverse pairs allow for easy derivations of combinatorial identities and reveal associations among different reps of operator calculi.
Look at how this connects the distributive operator exponentiation $:xD:^n=x^nD^n$ to umbral lowering of superscripts. The falling factorials and Bell polynomials are an umbral inverse pair, i.e., $\phi_n((x).)=x^n=(\phi.(x))_n$. This is reflected in the functions $\log(1+t)$ and $e^t-1$, defining their e.g.f.s $e^{x\log(1+t)}$ and $e^{x(e^t-1)}$, being regular compositional inverses and to the lower triangular matrices containing the coefficients for the polynomials (the Stirling numbers of the first and second kinds) being multiplicative inverses, so we can move among many reps to find and relate many formulas. For the derivative op rep,
$$((xD).)^n=(xD)_n=x^nD^n=:xD:^n=(\phi.(:xD:)).^n=(\phi.(:xD:))_n,$$
so we have a connection to the umbral lowering of indices
$$:xD:^n=((xD).)^n=(xD)_n=x^nD^n.$$
IV) The generalized Taylor series or shift operator is at the heart of umbral calculus:
$$ e^{p.(x)D_y}f(y) = f(p.(x)+y) \; , $$
(e.g., this entry on A class of differential operators and another on the Bernoulli polynomials) with special cases
$$ e^{:p.(x) D_x:} f(x) = f(p.(x) + x) \; , $$ and
$$ e^{-(1-q.(x))D_y}y^{s-1} \; |_{y=1} = (1-(1-q.(x)))^{s-1} \; ,$$ giving a Gauss-Newton interpolation of $q_n(x)$ (shadows of the binomial relations).
It can often be used to easily reveal interesting combinatorial relations among operators. A simple example:
$$ e^{txD} f(x) = e^{t\phi.(:xD:)} f(x) = e^{(e^t-1):xD:} f(x) = f(e^{t}x) \; .$$ You could even umbralize $t$ to obtain the Faa di Bruno formula. Try discovering some op relations with the Laguerre polynomials (hint--look at $:Dx:^n= D^nx^n$. Cf. Diff ops and confluent hypergeometric fcts.).
As another example (added May 2015) of the interplay between differential operators, umbral calculus, and finite differences, note the relations for the Bell polynomials
$$\phi_{n}(:xD_x:)= \sum_{k=0}^n S(n,k)x^kD_x^k = (xD_x)^n=\sum_{j=0}^\infty j^n \frac{x^jD^j_{x=0}}{j!}=\sum_{j=0}^\infty (-1)^j \left[\sum_{k=0}^j(-1)^k \binom{j}{k}k^n\right] \frac{x^jD_x^j}{j!} \;$$
and apply these operators on $x^m$, $e^x$, and $x^s$. (The $S(n,k)$ are the Stirling numbers of the second kind.)
I've used the power monomials $x^n$ and their associated raising and lowering ops, $x$ and $D_x$, but these relations are shadowed by the raising and lowering ops of all umbral sequences $p_n(x)$ such that $R \; p_n(x) = p_{n+1}(x)$ and $L \; p_n(x) = n \; p_{n-1}(x)$. (Shadows of Lie and quantum mechanics here also.)
V) (Added Sept. 2020): The generalized Chu-Vandermonde identitiy for the discrete convolution of binomial coefficients--integral to understanding properties of confluent hypergeomeric functions and their diff op reps--is easily derived from the umbral Sheffer calculus.
Binomial Sheffer sequence of polynomials (BSP) have e.g.f.s of the form
$$e^{x \; h(t)} = e^{t \; B.(x)},$$
where $h(t)$ and is invertible and vanishes at the origin. This implies
$$e^{(x+y)h(t)} = e^{t \; B.(x+y)} = e^{xh(t)}e^{yh(t)} = e^{t \; B.(x)}e^{t B.(y)} = e^{t(B.(x)+B.(y))},$$ so follows the accumulation property $$(B.(x)+B.(y))^n = B_n(x+y).$$ The Stirling polynomials of the first kind, $ST1_n(x) = (x)_n$, are a BSP with $h(t)=\ln(1+t)$ and
$$\binom{x}{k} = \frac{ST1_k(x)}{k!},$$ so $$(ST1.(x)+ST1.(y))^n = ST1_n(x+y)$$
implies directly the Chu-Vandermonde identity
$$\binom{x+y}{n} = \sum_{k=0}^n \binom{x}{k} \binom{y}{n-k}.$$