It is not necessary to reprove the binomial formula if we are willing to wade through some abstract nonsense.
Denote by $X$ the space of sufficiently differentiable functions $x\mapsto f(x)$ defined in some neighborhood $U$ of $a\in{\Bbb R}$. The maps
$$p(f,g):=f\cdot g,\quad D_l(f,g):=(f',g),\quad D_r(f,g):=(f,g')$$
are bilinear on $X\times X$ and can therefore be lifted to linear maps on $Y:=X\otimes X$ such that
$$p(f\otimes g)=f\cdot g,\quad D_l(f\otimes g)=f'\otimes g,\quad D_r(f\otimes g)=f\otimes g'\ .$$
This means, e.g., that $$p\left(\sum_k \lambda_k (f_k\otimes g_k)\right)=\sum_k \lambda_k\> f_k\cdot g_k\ ,$$
and that $D_l$ and $D_r$ are now maps $Y\to Y$.
The product rule $(f\cdot g)'=f'\cdot g+f\cdot g'$ can be written as
$${d\over dx}\bigl( p(f,g)\bigr)=p\bigl(D_l(f,g)\bigr)+p\bigl(D_r(f,g)\bigr)\ ,$$
which lifts to $${d\over dx}\circ p\>(f\otimes g)=p\circ(D_l+D_r)(f\otimes g)\ .$$
As $D_l\circ D_r=D_r\circ D_l$ it then follows by induction that
$$\left({d\over dx}\right)^n\circ p=p\circ(D_l+D_r)^n=p\circ\sum_{k=0}^n{n\choose k} D_l^{n-k}\>D_r^k\ .$$
When applied to a single $f\otimes g$ this is Leibniz' formula.
There is a cute argument for $b=c+1$ though I don't know how to pull it through in the general case.
Let $U$ be the set of ordered $c+1$-tuples of sets of cardinality $c$ in which no set repeats. Then, obviously, $|U|=(c+1)!{{n\choose c}\choose c+1}$. Let $V$ be the set of ordered $c$-tuples of sets of cardinality $c+1$ in which no set repeats. Then $|V|=c!{{n\choose c+1}\choose c}$. Now take any element of $V$, i.e., an ordered family of sets $A_1,\dots,A_{c}$. Take any $a_1\in A_1$ ($c+1$ choices). Suppose that $a_1\in A_1,\dots,a_k\in A_k$ are already chosen. Then we want to choose $a_{k+1}\in A_{k+1}$ so that $a_{k+1}\ne a_j$ and $A_{k+1}\setminus \{a_{k+1}\}\ne A_j\setminus\{a_j\}$ for all $j=1,\dots,k$. Suppose that we have $\ell$ prohibitions of the first type. Then, if we honor them, the prohibitions of the second type for the corresponding sets will be honored automatically (if $a_j\in A_{k+1}$ and we don't remove it, then surely $A_{k+1}\setminus \{a_{k+1}\}\ne A_j\setminus\{a_j\}$. The remaining $k-\ell$ sets introduce at most one prohibition of the second type each, so we have $\le k$ prohibitions total. Thus we have $\ge c+1-k$ choices for $a_{k+1}$. Once we run the whole procedure, the reduced sets $A_j$ listed in the same order and the set $\{a_1,\dots,a_c\}$ listed last will constitute an element of $U$ and each element of $V$ will generate $\ge(c+1)!$ different elements of $U$ (by different choices of $a_1,\dots,a_c$. On the other hand, each element of $U$ can be obtained from at most $c!$ elements of $V$ (that is the number of ways to distribute the elements of the last $c+1$-st set among the first $c$. This immediately gives a non-strict inequality you want. To make it strict, you need to assume $c>1$ and $n\ge c+1$ (otherwise you have equality), in which case the ordered sequence of sets $A_j=\{1,2,\dots,c+1\}\setminus\{j\}$ has fewer than $c!$ distribution options (none at all, really) resulting in a legitimate element of $V$.
Edit (the general case)
In this case it will be convenient to define $U$ as the set of all ordered $b$-tuples of sets of cardinality $c$ such that the sets are pairwise different (as sets), the first $c$ sets are unordered, and the last $b-c$ sets are ordered. Then $|U|=b!(c!)^{b-c}{{n\choose c}\choose b}$. Let $V$ be the same as before (ordered $c$-tuples of unordered subsets of cardinality $b$ in which the sets are pairwise different), so $|V|=c!{{n\choose b}\choose c}$.
Now let $\langle A_1,\dots, A_c\rangle$ be an element of $V$. We want to remove $b-c$ elements $a_{j,1},a_{j,2},\dots,a_{j,b-c}$ from $A_j$ and form $b-c$ new ordered sets $B_k=\langle a_{1,k},a_{2,k},\dots, a_{c,k}\rangle$ ($k=1,\dots,b-c$) to get an element of $U$.
To this end start with choosing $a_{1,1},\dots,a_{1,b-c}\in A_1$ in an arbitrary way, which gives $b(b-1)\dots(b-c+1)$ choices. Now, when choosing $a_{k+1,1}\in A_{k+1}$ for $k\ge 1$, add to the standard prohibitions $a_{k+1,1}\ne a_{j,1}$, $A_{k+1}\setminus\{a_{k+1,1}\}\ne A_j\setminus\{a_{j,1}\}$ ($j\le k$), which exclude at most $k$ elements just as before, the prohibitions $a_{k+1,1}\ne a_{1,m}$ ($m>1$), which exclude additional $b-c-1$ elements at most, leaving $c+1-k$ choices as before. These additional prohibitions guarantee that the set $B_1=\{a_{1,1},a_{2,1},\dots, a_{c,1}\}$ does not contain any of the elements $a_{1,m}$ ($m>1$) and, therefore, will be different from every set $B_m$ ($m>1$) as an unordered set and we still have $c!$ choices.
Having constructed $B_1$, we construct $B_2$ with the standard restrictions $a_{k+1,2}\ne a_{j,2}$, $A_{k+1}\setminus\{a_{k+1,1},a_{k+1,2}\}\ne A_j\setminus\{a_{j,1},a_{j,2}\}$ ($j\le k$) and additional restrictions $a_{k+1,2}\ne a_{1,m}$ but now with $m>2$. This gives $c!$ choices again, and so on. Thus, from each element of $V$, we get $b(b-1)\dots(b-c+1) (c!)^{b-c}$ distinct elements of $U$. The recovery of an element of $V$ from an element of $U$ is now possible in just one way, if at all, which, again, gives a non-strict inequality. I leave it to you to figure out when and how it becomes strict in this case.
Best Answer
Here's one combinatorial way to look at both formulas.
For the first one, let $M$ be the operator which eats a polynomial $f(x,y)$ and returns the polynomial $(x+y)f(x,y)$. Note $M$ is linear, since multiplication is distributive. We want to start with $1$, apply $M$ $n$ times, and see what we get. The point is that $M$ sends any monomial $x^r y^s$ to a sum of two related ones: $$M : x^r y^s \to x^{r+1}y^s + x^r y^{s+1}.$$
Therefore, $(x+y)^n$ enumerates paths from the top of the following diagram to the bottom row; the coefficient of $x^k y^{n-k}$ is the number of paths from $1$ to $x^k y^{n-k}$, but that's $\binom nk$ since any path is a sequence of $n$ left/right choices, and you have to go left $k$ times and right $n-k$ times.
$$\matrix{ &&&&&&1\\ &&&&&\swarrow&&\searrow\\ &&&&x&&&&y\\ &&&\swarrow&&\searrow&&\swarrow&&\searrow\\ &&x^2&&&&xy&&&&y^2\\ &&&&\dots&&\dots&&\dots\\ &\swarrow&&\searrow&&\swarrow&&\searrow&&\swarrow&&\searrow\\ x^n&&&&x^{n-1}y&&\dots&&xy^{n-1}&&&&y^n }$$
For the second formula, $D$ is the derivative operator; we would like to apply it $n$ times to the product $f g$. Notice that $D$ acts nicely on $f^{(r)}g^{(s)}$: $$D : f^{(r)}g^{(s)}\to f^{(r+1)}g^{(s)}+f^{(r)}g^{(s+1)}.$$
So $(fg)^{(n)}$ enumerates paths from the top of the following diagram to the bottom row. $$\matrix{ &&&&&&f^{(0)}g^{(0)}\\ &&&&&\swarrow&&\searrow\\ &&&&f^{(1)}g^{(0)}&&&&f^{(0)}g^{(1)}\\ &&&\swarrow&&\searrow&&\swarrow&&\searrow\\ &&f^{(2)}g^{(0)}&&&&f^{(1)}g^{(1)}&&&&f^{(0)}g^{(2)}\\ &&&&\dots&&\dots&&\dots\\ &\swarrow&&\searrow&&\swarrow&&\searrow&&\swarrow&&\searrow\\ f^{(n)}g^{(0)}&&&&f^{(n-1)}g^{(1)}&&\dots&&f^{(1)}g^{(n-1)}&&&&f^{(0)}g^{(n)} }$$
Same graph, same numbers of paths, same coefficients.