A 2-form is a function that eats a parallelogram (technically it eats 2 vectors, which you should think of as spanning a parallelogram) and spits out a number proportional to its area. A 3-form eats a parallelepiped (the 3-dimensional analog of a parallelogram) and spits out a number proportional to its volume. A 4-form eats a 4-dimensional parallelotope and spits out a number proportional to its hypervolume. A 1-form eats a line segment (which you can think of as a 1-dimensional parallelogram) and spits out a number proportional to its length. A 0-form eats a single point (which you can think of as a 0-dimensional parallelogram) and spits out a number, though there's nothing for it to be proportional to since a point has no extension in space. I think you get the picture. In general an n-form eats n vectors, which you should think of as spanning an n-dimensional parallelotope, and spits out a number proportional to its hypervolume.
Usually books that teach differential forms obscure this. They will define an n-form as a "real-valued multilinear, skew-symmetric function of n vectors". But it means the same thing. Multilinearity and skew-symmetry = output is proportional to length/area/volume/hypervolume. The determinant, which is used to compute the volume of a parallelepiped (and its higher and lower dimensional analogs), has the same two properties.
So why do we require forms to have this property? Well it's just because it's needed for integration. Imagine a curve you want to integrate over. The first step is to approximate it with line segments. Then you apply some function to each line segment in order to get a number. You need that number to shrink as the size of the line segment shrinks otherwise the sum won't converge. Think about it, if the output of the function was independent of the length of the input, then as more segments were added to the approximation the sum would just shoot up to infinity. Now think of a surface you want to integrate over. You can approximate it with parallelograms, imagine the scales of an armadillo. Then for each parallelogram you apply some function that spits out a number. We need the numbers to shrink as the scales do so the sum actually converges. If you want to integrate over some 3-dimensional volume, approximate it with parallelepipeds and again evaluate a function for each parallelepiped. The output of this function needs to shrink with its input for the sum to converge. These functions that we integrate over curves/surfaces/volumes/hypervolumes are forms.
Now let me explain why you write forms as linear combinations of elementary forms. It has to do with the generalized Pythagorean theorem, which I'll just call the GPT. In the same way that the length of a line segment is equal to the sum of the squared lengths of its projections onto the various coordinate axes, the area of an arbitrary parallelogram is equal to the sum of the squared areas of its projections onto the various coordinate planes. And the volume of a parallelepiped is equal to the sum of the squared volumes of its projections onto the various 3-dimensional subspaces. And so on. So the Pythagorean theorem applies to more than just line segments.
So let's look at the example of a 1-form that eats line segments embedded in 3-dimensional space. In general it's gonna look like $adx + bdy + cdz$ (if you forgot, $dx$, $dy$, and $dz$ are just functions that eat a line segment and spit out its projections on the x axis, y axis, and z axis respectively). All that's happening is you're taking the dot product of a vector $(a,b,c)$ with another vector $(dx,dy,dz)$ which equals the projection of $(a,b,c)$ onto $(dx,dy,dz)$ times the length of $(dx,dy,dz)$ (the length of $(dx,dy,dz)$ is $\sqrt{dx^2 + dy^2 + dz^2}$ ie the length of the line segment by the GPT). In other words $adx + bdy + cdz$ is literally just another way of writing: (projection of $(a,b,c)$ onto $(dx,dy,dz)$) times (length of the line segment). Since the length of the line segment is a factor in this product, the function is obviously proportional to the length of the line segment. Any 1-form can be written like this.
Another example: A 2-form that eats parallelograms embedded in 3-dimensional space is gonna have the form $a(dx \wedge dy) + b(dx \wedge dz) + c(dy \wedge dz)$ (if you forgot, $dx \wedge dy$, $dx \wedge dz$, and $dy \wedge dz$ are just functions that eat parallelograms and spit out the areas of their projections on the xy, xz, and yz planes respectively). So this is just another way of writing the dot product of $(a,b,c)$ and $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$ which is just the projection of $(a,b,c)$ onto $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$ times the length of $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$ (which is $\sqrt{(dx \wedge dy)^2 + (dx \wedge dz)^2 + (dy \wedge dz)^2}$ ie the area of the parallelogram by the GPT). In other words the linear combination is just equal to: (projection of $(a,b,c)$ onto $(dx \wedge dy, dx \wedge dz, dy \wedge dz)$) times (area of the parallelogram). Which is clearly a function proportional to the area of the parallelogram.
Another example: A 2-form that eats parallelograms in the plane. It has the general form $a(dx \wedge dy)$. You only need one term because $dx \wedge dy$ already gives you the area of the parallelogram. In the same way $dx$ gives you the length of your line segment if you're only in 1 dimension. It's only when you're in a dimension higher than the dimension of the line segment/parallelogram/parallelepiped/parallelotope that you're gonna have to invoke the GPT ie have a linear combination of multiple elementary forms.
So hopefully you see that differential forms are actually very simple objects. They're merely generalized integrands. Other things in exterior calculus like the exterior derivative, the generalized stokes theorem, etc are similarly very simple when explained properly.
edit: a slightly cleaned up version of this post with some pictures can be found here: https://simplermath.wordpress.com/2020/02/13/understanding-differential-forms/
Best Answer
I like this question a lot and I think that it's an important one. So here goes a (necessarily incomplete) attempt at answering such a broad and personal question.
First, "motivation" and "understanding for the essence" can mean very different things. There is of course physical motivation and intuition, and that probably applies most immediately to the Calculus III course that you are talking about. E.g. for the concept of derivatives of vector valued functions, you can think of the vector valued function of time that gives the position of an object as a vector. Of course, its derivative with respect to time will be the velocity (also a vector, since it described the speed and the direction of the movement) and the second derivative will be the acceleration. A good course in such an applicable subject will not just ask question like "compute the derivative of such and such a function", but will actually confront the student with real life examples.
But there is also intuition for less physical and more platonic concepts, such as that of a group, or of a prime number. Again, examples help. Also, you should always try to ask yourself the question "Could I have invented this?". If you see a new definition, ask yourself "What concrete problem might have prompted someone to define such a thing?". If you see a new result, ask yourself "Why was this to be expected, why would it be at least a reasonable conjecture?". Then try to convert your intuition into a proof. When you see a proof, ask yourself "Why is this a natural approach to try? Could I have proven this?". I agree with you that knowing the historical development can be very helpful in this and you should invest time in researching it.
I would like to contradict you in your assertion that intuition, motivation and historical context are black magic secrets that mathematicians acquire and then keep to themselves. It is true of some books and some teachers. So, you just have to find the right books. For that, you could ask for a specific recommendation here, including the area you want to learn and the books you have looked at, together with the reason you found them deficient. Of course, you can also ask specific "intuition" type questions.
To learn to appreciate mathematics, it is important to think about mathematics in your "spare time". Go out into nature and think about what your lecturer just told you in the last lecture. Or just think about whatever you find interesting. Then come back home with specific questions and look them up or ask them here.
Finally, something that I preach my students all the time is that they should develop a critical approach to what they are taught: if I give them a definition, they should try to come up with as many examples as possible. If a state a theorem of the type "A implies B", they should go home and find an example that "B does not necessarily imply A". If they do find such an example, they should ask themselves what additional hypotheses they need to impose to get the converse. If they don't, they should come back to me and ask me "but you haven't told us the whole story. What about the converse?".
In short, don't expect your lecturers to tell you everything you need to know. You should expect to have to think, to investigate yourself, to ask questions, and, above all, to think about mathematics because you can't help it, rather than because you are told to. This is not something, most people are born with, it's something that you have to cultivate.