The area under the curve for a continuous function $y = f(x)$ between $x$ and $x + h$ could be computed by finding the area between $0$ and $x + h$, then subtracting the area between $0$ and $x$. Thus the area of the strip would be $A(x+h) - A(x)$.
Now $f(x) \cdot h$ is the linear approximation to the area of the strip. This approximation improves as $h$ becomes smaller, i.e. $A(x+h) - A(x) \approx f(x) \cdot h$. The approximation becomes an equality as $h$ approaches zero in the limit.
Now divide both sides by $h$, then $f(x) \approx (A(x+h) - A(x))/h$. As $h$ approaches zero in the limit, then the RHS becomes the derivative of the area function $A(x)$.
Thus, what the fundamental theorem of calculus says is this: "The derivative of the area under the curve $f(x)$ is the curve $f(x)$". Hence differentiation and integration are inverse operations. This is an informal (non-rigorous) proof. But I hope it gives you an intuitive understanding of the theorem.
The inverse relationship between integration and differentiation can be further elaborated (intuitively) as follows:
What are we essentially doing, when we are differentiating? We are taking the change in $f(x)$, over an interval of length $h$ units, and looking at how this change is distributed over the interval (i.e. we are calculating the change over an interval of unit length). If $f(x)$ was a linear function, then we would just divide this change by the length of the interval. $(\Delta f(x)/h = m)$. (where $m$ is the slope of $f(x)$).
If $f(x)$ is a non-linear function, then, $f'(x)$ = $\lim_{h\to0} \Delta f(x)/h$.
When we integrate a function $f(x)$ over an interval of length $h$, we are accumulating the function $f(x)$ over the length of the interval. If $f(x)$ was a linear function then we would just multiply the change in unit length by the length of the interval and add it to the value of $f(x)$ at the starting point of the interval $f(x)$ = $f(x_0) + \Delta f(x)$, where
$\Delta f(x) = m \cdot h$ (where $m$ is the slope of $f(x)$).
If $f(x)$ is a non-linear function, then, $f(h)$ = $f(x_0) + \int_{x_0}^h f'(x)dx $. (Note that here we are integrating (accumulating) the change in unit length).
Now, division (repeated subtraction) and multiplication (repeated addition) are inverse operations, and provided, that the error between the linear approximation and the actual area or rate of change is an infinitesimal (i.e. it can be made as small as we please), then this inverse relationship holds even in the non-linear case.
The primary reason for the confusion, regarding the inverse relation between differentiation and integration, is the interpretation of integration as measuring an area. One has to understand that this area is only a measure. If $f(x)$ represented an area, and $x$ length, then the 'area' under $f(x)$ would actually represent a volume. The area is thus a measure of the volume.
It would be better, to think of the integral as accumulation of a quantity (in our discussion above, this quantity is the change over an interval of unit length). This definition is consistent, with the interpretation of integral as an area. Let's consider a rectangle of length $4$ units and breadth $3$ units. If $x$ represents length, then, $f(x)$ = $3$. What are we doing, when we are calculating its area? We are accumulating this breadth over an interval of length $4$ units. $A$ = $3 + 3 + 3 + 3 = 3 \cdot 4 $ = $\int_0^4 3 dx$ = $[3x]_0^4$ = $12$ square units.
Now lets differentiate this function. To differentiate, we have to take its accumulated change over any interval and calculate its distribution. But this function doesn't change - $f(x)$ = a constant = $3$. So there's no change, and hence the derivative is equal to zero. $f'(x) =0$.
It is intuitively obvious then, that if we first accumulate a quantity over an interval and then distribute this accumulation over the same interval, then we should end up with our original quantity. Let's turn to the rectangle example again. To calculate the area we accumulate $f(x)$, over an interval of length $4$ (where $f(x)$ = $3$), then we get the area as $12$ square units. Now if we distribute this area over the interval of $4$ units, then what do we get? $12/4$ = $3$ units - ($f(x) = 3$) - which is the distribution of this accumulation(area) over over the interval of $4$ units. (Derivative of $3x$ = $3$). This is the inverse relationship aspect of the FTC.
There may not be any visible connection between calculating an area and calculating a rate of change. But there is an inverse relationship between accumulation and distribution.
The substitution rule/change of variables theorem says the following:
Suppose $f:[a,b]\to\Bbb{R}$ is continuous, and $u:[\alpha,\beta]\to [a,b]$ is differentiable with Riemann-integrable derivative (or at this point if you don't like remembering various hypotheses, just assume everything is smooth). Then,
\begin{align}
\int_{\alpha}^{\beta}f(u(x))\cdot u'(x)\,dx &= \int_{u(\alpha)}^{u(\beta)}f(t)\,dt
\end{align}
"substitute $t=u(x)$"
If you state the theorem like this, there is no need at all for any injectivity assumptions on $u$; this equality follows immediately from the fundamental theorem of calculus and chain rule (if $F$ is a primitive of $f$, then the LHS and RHS are equal to $F(u(\beta))-F(u(\alpha))$). The problem is that people often don't carefully specify the two functions $f$ and $u$; what ends up happening is they misapply the theorem and then impose extra conditions like injectivity (which of course doesn't hurt, but it doesn't really address the issue).
In your example, let $f(t)=t^2$ and $u(x)=\sin x$ (we can define these functions on all of $\Bbb{R}$, so there's no domain issues here at all, and all the compositions make sense etc). Then,
\begin{align}
\int_0^{2\pi}\sin^2x\cdot \cos x\,dx &=\int_0^{2\pi}f(u(x))\cdot u'(x)\,dx\\
&=\int_{u(0)}^{u(2\pi)}f(t)\,dt\\
&=\int_0^0t^2\,dt\\
&= 0.
\end{align}
This really is by a direct application of the theorem; I'm not sure why you say it is erroneous.
Of course, a corollary of the theorem I wrote above is the following:
Suppose $g:[\alpha,\beta]\to\Bbb{R}$ is continuous and $v:[\alpha,\beta]\to [a,b]$ is $C^1$ with $C^1$ inverse. Then,
\begin{align}
\int_{\alpha}^{\beta}g(x)\,dx &= \int_{v(\alpha)}^{v(\beta)}g(v^{-1}(t))\cdot (v^{-1})'(t)\,dt\\
&=\int_{v(\alpha)}^{v(\beta)}g(v^{-1}(t))\cdot \frac{1}{v'(v^{-1}(t))}\,dt
\end{align}
"substitute $x=v^{-1}(t)$"
The "advantage" of this formula is that on the LHS there is only $g$, i.e without any variable changes, and we move all the stuff involving changes of variables to the other side of the equation (so all instances of $v$ appear only on the RHS). Compare this to my first formula, where we didn't make any assumptions of injectivity, and thus as a result we have $u$ appearing on both the LHS and the RHS of the equation. The added hypothesis of injectivity is the price we pay if we want to isolate everything to one side.
Sometimes in computations, this second form of the theorem (which is really a special case of the one above) is more useful, which is why people may sometimes insist that injectivity is a must.
Best Answer
The word "integral" is used in two completely different senses. The first, called definite integral, has a simple geometric (or physical) interpretation, the second, called indefinite integral, is accessible only to people having the notion of "derivative of a function of one variable" in their repertoire. It is true that in the one-dimensional case there is a connection between the two notions. This connection is called the fundamental theorem of calculus.
(a) The definite integral: Given some sort of "intensity" $f(x)$ at each point $x$ of some domain $B$ (an interval, a sphere, a cube in ${\mathbb R}^n$, etc.), where $f(x)$ varies with $x$, one can ask for the "total effect" an agent of this intensity could have. This total effect is the integral of $f$ over $B$ and is denoted by $$\int_B f(x){\rm d}(x)$$ (or similar). From the geometric intuition behind it this quantity is a limit of Riemann sums, viz. $$\int_B f(x){\rm d}(x)\ =\ \lim_{\ldots} \sum_k f(\xi_k)\ \mu(B_k)\ ,$$ where the $B_k$ form a disjoint partition of $B$ into very small subdomains and $\mu$ denotes the natural geometric measure (length, surface area, $n$-dimensional volume) in the situation at hand.
(b) The indefinite integral: Given a function $t\mapsto f(t)$ on some interval $I\subset{\mathbb R}$ one may ask: Is this function the derivative of some other function $F(\cdot)$? The answer is yes, and in fact there is an infinite set of such functions $F(\cdot)$, whereby any two of them differ by a constant on $I$. This set of functions is called the indefinite integral of $f$ on $I$ and is denoted by $$\int f(t)\ dt\ .$$
(c) The fundamental theorem of calculus: Given two points $a$, $b\in I$ the difference $F(b)-F(a)$ has the same value for all functions $F\in\int f(t)\ dt$ and may as well be denoted by $$\int_a^b f(t)\ dt\ .$$ Now comes the theorem (and this is the real wonder, not the fact that taking the derivative of the antiderivative of $f$ gives back $f$): When $a<b$ then $$\int_{[a,b]} f(t)\ {\rm d}t = \int_a^b f(t)\ dt\ .$$ Here on the left side we have a limit of Riemann sums, and on the right side a difference of $F$-values.