A differential k-form on an n-manifold can be visualised as a "density" of (n - k) submanifolds. In $\mathbb{R}^3$ a 3-form is a point density. The form $dx \wedge dy \wedge dz$ is a uniform point density such that there is 1 full point in a unit cube. The integral $\int_S dx \wedge dy \wedge dz$ measures the number of points inside $S$, which is equal to the volume of $S$. The form $f dx \wedge dy \wedge dz$ is a point density with density $f(x,y,z)$ at that point.
The form $dx$ is a density of planes of constant $x$ (i.e. $yz$-planes) such that there is 1 full plane in a unit of $x$. In other words, the line segment $(t,0,0)$ for $t=a$ to $t=b$ crosses $b - a$ planes (note that the orientation matters). As with a point density, it's not that we cross a plane discretely at $x=0, x=1, x=2$, rather, the $yz$-planes are distributed uniformly along the $x$-axis. For a curve $\gamma : [0,1] \rightarrow \mathbb{R^3}$ we can count the number of $dx$ planes it crosses. This is denoted $\int_\gamma dx = x(\gamma(1)) - x(\gamma(0))$. A general form $\alpha = A dx + B dy + C dz$ can be visualised as a density of surfaces with general orientation. The integral $\int_\gamma \alpha$ counts the number of times the curve $\gamma$ pierces a surface in the surface density. In other words, it counts the intersections of $\gamma$ with the surface density.
For a function $f : \mathbb{R}^3 \rightarrow \mathbb{R}$ the form $df$ represents surfaces of constant $f$. Viewing $x$ as a function that gives the $x$ coordinate for a point, you can see that $dx$ corresponds to planes of constant $x$. If $r = \sqrt{x^2 + y^2 + z^2}$ then $dr$ are the surfaces of constant $r$, i.e. spheres centered at the origin.
A 2-form $dx \wedge dy$ represents lines in the $z$-direction. Lines in the $z$-direction are formed by the intersection of a plane of constant $x$ and a plane of constant $y$, i.e. lines in the $z$-direction are lines of constant $x$ and $y$. For functions $f$ and $g$, the form $df \wedge dg$ represents curves of constant $f$ and $g$. For two general 1-forms $\alpha$ and $\beta$, which represent densities of surfaces, the form $\alpha \wedge \beta$ represents the density of curves formed by the intersection of those surfaces. The form $dx \wedge dy \wedge dz$ is the point density of intersecting the lines $dx \wedge dy$ in the $z$-direction with the $xy$-planes $dz$.
Given a parameterised surface $A : \mathbb{R}^2 \rightarrow \mathbb{R}^3$, the integral $\int_A \alpha$ of a 2-form $\alpha$ is the number of times the lines of $\alpha$ intersect the surface $A$.
The operation $d$ forms the boundary of the density of curves/surfaces/volumes. For example, of $\alpha$ is a 2-form representing a collection of curves, $d\alpha$ represents the collection of endpoints of those curves. We can understand the formula $d(df) = 0$; the curves $df$ of constant $f$ have no endpoints. The boundary of a density of volumes is a density of surfaces, the boundary of a density of surfaces is a density of curves, the boundary of a density of curves is a density of points, the boundary of a density of points is zero. Let's understand the form $x dy \wedge dz$. Visualize this as small lines of constant $y,z$ i.e. lines in the x direction. The lines have density $x$ near a point $x,y,z$. This can only be accomplished if the line density has a collection of boundary points. As we move further along $x$, the density of lines in the $x$ direction gets higher, and those lines have to start somewhere. Indeed, we see that $d(x dy \wedge dz) = dx \wedge dy \wedge dz$. The collection of lines $x dy \wedge dz$ has a uniform density of start points. On the other hand, $y dy \wedge dz$ has no net start points. This is just a collection of lines in the $x$ direction that gets denser as we move in the $y$ direction, but those lines still go on from $x = -\infty$ to $x = \infty$. Indeed, we see that $d(y dy \wedge dz) = 0$.
In summary, on an $n$-manifold
- An $k$ form represents a density of $(n - k)$ submanifolds.
- The wedge product represents intersection of densities.
- The $d$ calculates the boundary.
- The integral $\int_M \alpha$ of a $k$-form along a $k$-submanifold computes the number of intersection points of the $k$-submanifold with the density of $(n - k)$ submanifolds represented by the form.
We can also intuitively understand the general Stokes theorem $\int_M d\alpha = \int_{\partial M} \alpha$. Let's consider a function $f$ on $\mathbb{R}^2$. The form $df$ represents the countours of constant $f$ (think of a contour plot), with a density such that there are net $b - a$ contours between the countour $f(x,y) = a$ and $f(x,y) = b$. Now consider a curve $\gamma$ with endpoints. The integral $\int_\gamma df$ calculates the number of contours crossed by $f$. The integral $\int_{\partial \gamma} f$ is just $f(\gamma(1)) - f(\gamma(0))$, i.e. $f$ evaluated along the boundary of $\gamma$ (with the appropriate orientation). We can see that these two integrals are equal: the number of countours crossed by $\gamma$ is precisely the height difference between the start and endpoint.
Now let $V$ be a volume in $\mathbb{R}^3$ with boundary $\partial V$, and let $\alpha$ be a 2-form representing a density of curves. The integral $\int_V d\alpha$ counts the number of endpoints sitting inside $V$. The integral $\int_{\partial V} \alpha$ counts the number of curves piercing through the boundary $\partial V$. You can intuitively understand that these are equal: the curve emanating from an endpoint must either have its other endpoint inside $V$, in which case this part of the curve density does not contribute as one endpoint is positive and the other negative, or the curve has its other endpoint outside $V$, in which case it must pierce its boundary.
The Stokes theorem for a surface in $\mathbb{R}^3$ is a bit tricky to describe with words, but drawing a picture will convince you.
You can also use this picture to understand the pullback, Poincaré lemma, the formula for $d(\alpha \wedge \beta)$, the degree formula, Poincaré duality, and so on.
One last point: why is there this weird inversion of dimension that a $k$-form represents a density of $(n - k)$ submanifolds? Why not use $r$-vectors to represent densities of $r$ submanifolds rather than $n - r$ covectors (which is what differential forms are). In particular, why not use a normal vector field to represent a density of curves? The reason is that $r$ vectors do not have the best transformation properties. A point density on the plane $\mathbb{R}^2$ is something that assigns a real number to an area. The infinitessimal version of this is that for a 2-form $\alpha$ the quantity $\alpha(x,y)(u,v)$ counts the number of points in a small parallelogram spanned by the vectors $u,v$ at the point $x,y$. If we deform the plane then the parallelogram of the vectors will deform with it, and $\alpha(x,y)(u,v)$ stays constant. If we used $r$ vectors rather than $r$ covectors, we would only be invariant under isometries rather than general diffeomorphisms. Suppose we have a vector field on the plane and a curve $\gamma$ in the plane. There is no basis independent way to say how many times the curve intersects with the vector field. This question only makes sense up to some scale factor. Forms have a natural scale attached to them, because a form $df$ naturally eats a tangent vector $y'(t)$ as $df(\gamma(t))(\gamma'(t))$. To do this with the vector field you'd have to choose a basis/coordinate system.
I hope that helps.
In my opinion, a lot of these relationships are suggested by abusive notation, abuses that hide what's really going on.
Don't get me wrong: some abuses of notation are harmless, or at the least, they help people get going on doing calculations. But they should still be understood to the fullest degree for those who wish to go beyond merely doing calculations.
I'll give an example: consider the relationship,
$$\frac{dx}{dy} = \frac{1}{\frac{dy}{dx}}$$
You probably know that differentials shouldn't really be divided, that this notation is really only suggestive, and while what it says is true by the inverse function theorem, it does so in a voodoo-like way that doesn't stand up to closer inspection, raising more questions than answers.
Of course, there's a totally reasonable way to phrase this notion: as I said, it's the inverse function theorem. Given a function $f$ on a vector $x$, we have the Jacobian $J_f$, and we know that
$$J_{f^{-1},f(x)} = J_{f,x}^{-1}$$
Which is a totally rigorous, though perhaps less obviously useful, statement.
(You might be thinking that nonstandard analysis could be useful here. Perhaps it would be, but my point is a bit larger: to understand and feel comfortable with the statement, you need to either take for granted that it stands in for something else, or accept that you need more math to understand it the way it's written.)
So, how does this relate to differentials and differential forms?
Well, mostly through the use of $d$ to denote the exterior derivative. Changing this symbol reveals how manifestly nonsensical some apparent relationships are.
For the purposes of this answer, I'll denote the exterior derivative by $\nabla$. This is reasonably familiar to students of vector calculus in 3d, and most of the results can be used directly from there.
Let's address your point (1), the total differential. It would be written as,
$$\nabla f = (\partial_i f) \nabla x^i$$
Again, recognizing the connection between the exterior derivative and the gradient from vector calculus, you should realize that the $\nabla x^i$ are nothing more than a set of basis vectors (more exactly, basis covectors), and all this does is decompose the gradient of $f$ into some coordinate directions. There is no explicit connection here between the gradient and differentials.
Let's talk about point (2), integrals around curves.
This is a common misconception from people who work with differential forms. I'll point out that the quantity $r'(t) = (x', y', z')(t)$ is manifestly a tangent vector. It literally points tangent to the curve that is the domain of integration, and fundamentally, it obeys quite different transformation laws than any form.
Moreover, if $F$ is a one-form, then it should be written
$$F = F_x \nabla x + F_y \nabla y + F_z \nabla z$$
If all the supposed $dx$'s are coming from the form, then what's coming from the $dl$? As argued above, what comes with $dl$ is not a set of basis forms but a vector, the tangent vector to the curve. Writing this vector $\ell'(t) = x' \partial_x r + y' \partial_y r + z' \partial_z r$ (where $r$ is a vector), we get for the dot product,
$$\int F \cdot dl = \int (F_x \circ l)(t) x'(t) \nabla x \cdot \partial_x r + \ldots \, dt$$
Of course, $\nabla x \cdot \partial_x r = 1$ by definition--otherwise, the basis forms would not be dual to the basis vectors. What would happen if we wrote the basis forms with the usual $dx$ notation?
$$\int F \cdot dl = \int (F_x \circ l)(t) x'(t) dx(\partial_x r) + \ldots \, dt$$
On its face, this looks like gobbledy-gook. Even if you had the presence of mind to distinguish between a basis form $dx$ and a differential denoting the variable of integration $dt$, it would be challenging to reconcile how these two notions should coexist in the same integral. I know I've met one person on this very site who suggested that no one should ever work with $dx$ and the like because you're just going to pull back anyway, so only $dt$ should be viewed as a differential form on this curve. That's...certainly one way of looking at things. To me, that comes at a high price of not being able to look at things geometrically. Let me explain:
What are you doing when you pull back a form in an integral like this? You're making it so the tangent vector in the target space has constant direction and magnitude (since you're pulling back to a 1d vector space, the image of the tangent vector is just the trivial unit vector). This is what's commonly done for form integrals, because then all your complexity is in the form, and in the Jacobian transforming that form, rather than in considering the components of the tangent vector. For this reason, the tangent vector is sometimes forgotten or neglected, since once you've pulled back, it's some trivial constant vector that will just be eaten by the form anyway. All that remains to be done is to set some convention for what direction it should be: positive or negative.
Anyway, you could call a basis form on that space by name, and perhaps some people would call it $dt$. If that abstract way of thinking works for you, do what you feel is best.
Finally, let's talk about point (3): this is more of a geometric interpretation question, and it's not unique to differential forms. Should a vector field be viewed as small, directed lines at every point? This is certainly behind the notion of field lines, which are commonly used for electric fields. I'm not sure I could say one (vectors) is more differential than the other (forms). Both involve orientations and magnitudes. In the end, I have to offer the same perspective as I would for vectors: does it make sense to think of a vector as a small piece of a line? If so, how would you decide that differentials are associated with forms instead of vectors? If not, how is this different from what you've done with forms?
Let me not digress for too long. There's a reason the notation for differential forms has stuck around as long as it has: it's enormously suggestive, and for dealing with unfamiliar concepts, suggestive notation is powerful. But like with the inverse function theorem, I submit that that notation is merely suggestive, full of shortcuts and sleight of hand. I do not think differential forms turn infinitesimals rigorous--far from it, I think that a far stronger relationship between forms and these differentials in integrals is suggested by the notation in ways that it shouldn't be.
Best Answer
A 2-form is a function that eats a parallelogram (technically it eats 2 vectors, which you should think of as spanning a parallelogram) and spits out a number proportional to its area. A 3-form eats a parallelepiped (the 3-dimensional analog of a parallelogram) and spits out a number proportional to its volume. A 4-form eats a 4-dimensional parallelotope and spits out a number proportional to its hypervolume. A 1-form eats a line segment (which you can think of as a 1-dimensional parallelogram) and spits out a number proportional to its length. A 0-form eats a single point (which you can think of as a 0-dimensional parallelogram) and spits out a number, though there's nothing for it to be proportional to since a point has no extension in space. I think you get the picture. In general an n-form eats n vectors, which you should think of as spanning an n-dimensional parallelotope, and spits out a number proportional to its hypervolume.
Usually books that teach differential forms obscure this. They will define an n-form as a "real-valued multilinear, skew-symmetric function of n vectors". But it means the same thing. Multilinearity and skew-symmetry = output is proportional to length/area/volume/hypervolume. The determinant, which is used to compute the volume of a parallelepiped (and its higher and lower dimensional analogs), has the same two properties.
So why do we require forms to have this property? Well it's just because it's needed for integration. Imagine a curve you want to integrate over. The first step is to approximate it with line segments. Then you apply some function to each line segment in order to get a number. You need that number to shrink as the size of the line segment shrinks otherwise the sum won't converge. Think about it, if the output of the function was independent of the length of the input, then as more segments were added to the approximation the sum would just shoot up to infinity. Now think of a surface you want to integrate over. You can approximate it with parallelograms, imagine the scales of an armadillo. Then for each parallelogram you apply some function that spits out a number. We need the numbers to shrink as the scales do so the sum actually converges. If you want to integrate over some 3-dimensional volume, approximate it with parallelepipeds and again evaluate a function for each parallelepiped. The output of this function needs to shrink with its input for the sum to converge. These functions that we integrate over curves/surfaces/volumes/hypervolumes are forms.
Now let me explain why you write forms as linear combinations of elementary forms. It has to do with the generalized Pythagorean theorem, which I'll just call the GPT. In the same way that the length of a line segment is equal to the sum of the squared lengths of its projections onto the various coordinate axes, the area of an arbitrary parallelogram is equal to the sum of the squared areas of its projections onto the various coordinate planes. And the volume of a parallelepiped is equal to the sum of the squared volumes of its projections onto the various 3-dimensional subspaces. And so on. So the Pythagorean theorem applies to more than just line segments.
So let's look at the example of a 1-form that eats line segments embedded in 3-dimensional space. In general it's gonna look like $adx + bdy + cdz$ (if you forgot, $dx$, $dy$, and $dz$ are just functions that eat a line segment and spit out its projections on the x axis, y axis, and z axis respectively). All that's happening is you're taking the dot product of a vector $(a,b,c)$ with another vector $(dx,dy,dz)$ which equals the projection of $(a,b,c)$ onto $(dx,dy,dz)$ times the length of $(dx,dy,dz)$ (the length of $(dx,dy,dz)$ is $\sqrt{dx^2 + dy^2 + dz^2}$ ie the length of the line segment by the GPT). In other words $adx + bdy + cdz$ is literally just another way of writing: (projection of $(a,b,c)$ onto $(dx,dy,dz)$) times (length of the line segment). Since the length of the line segment is a factor in this product, the function is obviously proportional to the length of the line segment. Any 1-form can be written like this.
Another example: A 2-form that eats parallelograms embedded in 3-dimensional space is gonna have the form $a(dx \wedge dy) + b(dx \wedge dz) + c(dy \wedge dz)$ (if you forgot, $dx \wedge dy$, $dx \wedge dz$, and $dy \wedge dz$ are just functions that eat parallelograms and spit out the areas of their projections on the xy, xz, and yz planes respectively). So this is just another way of writing the dot product of $(a,b,c)$ and $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$ which is just the projection of $(a,b,c)$ onto $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$ times the length of $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$ (which is $\sqrt{(dx \wedge dy)^2 + (dx \wedge dz)^2 + (dy \wedge dz)^2}$ ie the area of the parallelogram by the GPT). In other words the linear combination is just equal to: (projection of $(a,b,c)$ onto $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$) times (area of the parallelogram). Which is clearly a function proportional to the area of the parallelogram.
Another example: A 2-form that eats parallelograms in the plane. It has the general form $a(dx \wedge dy)$. You only need one term because $dx \wedge dy$ already gives you the area of the parallelogram. In the same way $dx$ gives you the length of your line segment if you're only in 1 dimension. It's only when you're in a dimension higher than the dimension of the line segment/parallelogram/parallelepiped/parallelotope that you're gonna have to invoke the GPT ie have a linear combination of multiple elementary forms.
So hopefully you see that differential forms are actually very simple objects. They're merely generalized integrands. Other things in exterior calculus like the exterior derivative, the generalized stokes theorem, etc are similarly very simple when explained properly.
edit: a slightly cleaned up version of this post with some pictures can be found here: https://simplermath.wordpress.com/2020/02/13/understanding-differential-forms/