I want to start with a simple analogy to the (ordinary) derivative. So suppose that
$\omega$ is a $k$-form, and $X_1, \ldots, X_k$ are vector fields. And for the moment, I want you to imagine that the $X_i$ fields are all "constant near some point $p$. Now that doesn't really make sense (unless you're in $\Bbb R^n$) but bear with me. If $p$ is the north pole, and $X_1(p)$ is a vector field that points towards, say, London, then it makes sense to define $X_1$ near $p$ to also point towards London, and those vectors will (in 3-space) all be pretty close to $X_1(p)$.
Then we can define a function
$$
f(q) = \omega(q)[X_1(q), \ldots, X_k(q)]
$$
defined for $q$ near $p$.
How does $f(q)$ vary as $q$ moves away from $p$? Well, it depends on the direction that $q$ moves. So we can ask: What is
$$
f(p + tv ) - f(p)?
$$
Or better still, what is
$$
\frac{f(p + tv ) - f(p)}{t}?
$$
especially as $t$ gets close to zero?
That "derivative" is almost the definition of
$$
d\omega(p)[X_1(p), \ldots, X_k(p), v].
$$
There are a couple of problems with that "definition" as it stands:
- What if there are multiple ways to extend $X_i(p)$, i.e., what if "constant" doesn't really make sense? Will the answer be the same regardless of the values of $X_i$ near $p$ (as opposed to at $p$)?
- How do we know that $d\omega$ has all those nice properties like being antisymmetric, etc.?
- How does this fit in with div, grad, curl, and all that?
Problems 1 and 2 are why we have fancy definitions of $d$ that make theorems easy to prove, but hide the insight. Let me just briefly attack item 3.
For a 0-form, $g$, the informal definition I gave above is exactly the definition of the gradient. You have to do some stuff with mixed partials (I think) to verify that the gradient, as a function of the vector $v$ is actually linear in $v$, and therefore can be write $dg(p)[v] = w \cdot v$ for some vector $w$, which we call the "gradient of $g$ at $p$."
So that case is pretty nice.
What about the curl? That one's messier, and it involves the identification of every alternating 2-form with a 1-form (because $2 + 1 = 3$), so I'm going to skip it.
What about div? For the most basic kind of 2-form, something like
$$
\omega(p) = h(x, y, z) dx \wedge dy
$$
and the point $p = (0,0,0)$ and the vector $v = (0,0,1)$, and the two "vector fields" $X_1(x,y,z) = (1,0,0)$ and $X_2(x, y, z) = (0, 1, 0)$, we end up looking at
\begin{align}
f(p + tv) &= h(0, 0, t) dx \wedge dy[ (1,0,0), (0, 1, 0)]\\
&= h(0,0, t)
\end{align}
and the difference quotient ends up being just
$$
\frac{\partial h}{\partial z}(0,0,0)
$$
That number tells you how $\omega's$ "response" to area in the $xy$-plane changes as you move in the $z$ direction.
What's that have to do with the divergence of a vector field? Well, that vector field is really a 2-form-field, and duality has been applied again. But in coordinates, it looks like $(0,0,h)$, and its divergence is exactly the $z$-derivative of $h$. So the two notions match up again in this case.
I apologize for not drawing out every detail; I think that the main insight comes from recognizing the idea that the exterior derivative is really just a directional derivative with respect to its last argument...and then doing the algebra to see that it's also a directional derivative with respect to the OTHER arguments as well, which is pretty cool and leads to cool things like Stokes' theorem.
The usual way to arrive at this is by removing an $\varepsilon$-disk centered at the origin and taking the limit, but let me give a sketch of how you might do this with the Dirac delta function approach. I'm going to treat $\delta_0$ as the distribution with the property that for any smooth $f\colon\Bbb R^2\to\Bbb R$ with compact support, we have
$$\delta_0(f) = \int_{\Bbb R^2}\delta_0(x,y) f(x,y)\,dx\wedge dy = 2\pi f(0).$$
You start with $\omega = \dfrac{-y\,dx+x\,dy}{x^2+y^2} = d\theta$ and we want to see that $d\omega = \delta_0 \,dx\wedge dy$ as a current. In particular, this means we want to see that for any smooth $f$ with compact support, we have
$$\int_{\Bbb R^2} f\,d\omega = 2\pi f(0).$$
As usual, we start with the equation [think "integration by parts"]
$$d(f\omega) = f\,d\omega + df\wedge\omega$$
and integrate over a large closed ball $B(0,R)$ with the property that $f=0$ on $\partial B(0,R)$. It will be convenient to use polar coordinates, of course, and then we see that $df\wedge\omega = \left(\dfrac{\partial f}{\partial r}dr + \dfrac{\partial f}{\partial\theta}d\theta\right)\wedge d\theta = \dfrac{\partial f}{\partial r}dr\wedge d\theta$. So
\begin{align*}
\int_{\Bbb R^2} f\,d\omega &= \int_{\Bbb R^2} d(f\omega) - \int_{\Bbb R^2} df\wedge\omega \\
&= \int_{B(0,R)} d(f\omega) - \int_{B(0,R)} df\wedge\omega \\
&= \int_{\partial B(0,R)} f\omega - \int_0^{2\pi}\int_0^R \dfrac{\partial f}{\partial r}dr\,d\theta \\
&= 0 - \int_0^{2\pi} \big(f(R,\theta)-f(0,\theta)\big)d\theta = 2\pi f(0),
\end{align*}
as needed.
Best Answer
I hope you're fine with accepting that we should impose some sort of product rule in order to define $d$ (clearly product rule holds for functions, so we would like something similar to hold for higher order forms as well).
So, suppose the exterior derivative is such that for each integer $k,l\geq 0$, there exist $a_{kl},b_{kl}\in\Bbb{R}$ such that for all $k$-forms $\omega$ and $l$-forms $\eta$, \begin{align} d(\omega\wedge \eta)=a_{kl}\,d\omega\wedge\eta+ b_{kl}\,\omega\wedge d\eta.\tag{$*$} \end{align} Of course if $k=l=0$ then $a_{00}=b_{00}=1$. On the other hand, let us invoke the alternating nature of wedge products: \begin{align} d(\omega\wedge \eta)&=d\left[(-1)^{kl}\eta\wedge \omega\right]\\ &=(-1)^{kl}[a_{lk}\,d\eta\wedge \omega + b_{lk}\eta\wedge d\omega]\\ &=(-1)^{lk}[a_{lk}(-1)^{(l+1)k}\omega\wedge d\eta + b_{lk}(-1)^{l(k+1)}d\omega\wedge \eta]\\ &= (-1)^lb_{lk}\,d\omega\wedge \eta+ (-1)^{k}a_{lk}\,\omega\wedge d\eta\tag{$**$} \end{align} So, by comparing the coefficients in $(*)$ and $(**)$, we see that for all $k,l\geq 0$, we must have \begin{align} a_{kl}&= (-1)^lb_{lk}\quad \text{and}\quad b_{kl}=(-1)^ka_{lk} \end{align} By simple index manipulation you can see that these two conditions are equivalent. Therefore, the conditions on the coefficients are as follows:
From these, we get $b_{00}=1$, so it is consistent with the product rule of functions. Notice that thus far, we have only derived the natural consistency condition arising purely because of how the wedge product behaves; and that's where the factor of $(-1)^k$ creeps in. The usual exterior derivative is obtained by setting $a_{kl}=1$ for all integers $k,l\geq 0$, so that $b_{kl}=(-1)^k$.