First some differential topology of surfaces (like $\mathbb R^2$) that helps how to handle the coordinates change techniques to get how the change of components behave.
If you have two parametrization
$$\Phi:\Omega\hookrightarrow\Sigma
\qquad{\rm and}\qquad
\Psi:\Gamma\hookrightarrow\Sigma$$
of a surface $\Sigma$, the jacobians $J\Phi, J\Psi$ of them allows to assign the tangents frames at $p\in\Sigma$ via
$$J\Phi(a):\mathbb R^2\to T_p\Sigma,
$$
where $a\in\Omega$, $\Phi(a)=p$ and $T_p\Sigma$ is the tangent space at $p$, so $$\partial_0=J\Phi(a)e_0\quad{\rm and}\quad \partial_1=J\Phi(a)e_1.$$
Also
$$\tilde \partial_0=J\Psi(b)e_0\quad{\rm and}\quad
\tilde \partial_1=J\Psi(b)e_1,$$
with the other parametrization such that $\Psi(b)=p$.
Now, one can get a map $\lambda:\Omega\to\Gamma$ which complies
$$
\Phi=\Psi\circ\lambda\quad{\rm and}\quad \lambda(a)=b
$$
So, by the chain's rule $J\Phi=J\Psi\cdot J\lambda$ and
$$J\Phi(a)=J\Psi(b)\cdot J\lambda(a),$$
and
$$J\Phi(a)e_0=J\Psi(b)\cdot J\lambda(a)e_0
\quad{\rm and}\quad
J\Phi(a)e_1=J\Psi(b)\cdot J\lambda(a)e_1.$$
But, if
$$
J\lambda(a)=\left(\begin{array}{cc}
\lambda^0{}_0&\lambda^0{}_1\\
\lambda^1{}_0&\lambda^1{}_1\end{array}\right),
$$
then
\begin{eqnarray*}
\partial_0&=&J\Psi(b)(\lambda^0{}_0e_0+\lambda^1{}_0e_1)\\
&=&\lambda^0{}_0J\Psi(b)e_0+\lambda^1{}_0J\Psi(b)e_1\\
&=&\lambda^0{}_0\ \tilde\partial_0+\lambda^1{}_0\ \tilde\partial_1
\end{eqnarray*}
and similarly
$$\partial_1=\lambda^0{}_1\ \tilde\partial_0+
\lambda^1{}_1\ \tilde\partial_1.$$
For an arbitrary tangent vector in the first coordinates
$\vec v=a^0\partial_0+a^1\partial_1$, the new components that would have, will be
$$\vec v=(a^0\lambda^0{}_0+a^1\lambda^0{}_1)\tilde\partial_0+
(a^0\lambda^1{}_0+a^1\lambda^1{}_1)\tilde\partial_1,
$$
which correspond with the operation
$$
\left(\begin{array}{c}a^0\\a^1\end{array}\right)
\to
\left(\begin{array}{cc}
\lambda^0{}_0&\lambda^0{}_1\\
\lambda^1{}_0&\lambda^1{}_1
\end{array}\right)
\left(\begin{array}{c}a^0\\a^1\end{array}\right).
$$
Now for your case the surface is $\mathbb R^2$, $\Phi=1\!\!1$ the identity maps and
$\Psi$ is
$$
\left(\begin{array}{c}
r\\\theta\end{array}\right)\mapsto
\left(\begin{array}{c}r\cos\theta\\r\sin\theta\end{array}\right),
$$
Hence for $\lambda$ we have
$$
r=\sqrt{x^2+y^2}\quad {\rm and}\quad \theta=\arctan\frac{y}{x},
$$
and its derivative is
$$
\left(\begin{array}{cc}\dfrac{x}{\sqrt{x^2+y^2}}
&\dfrac{y}{\sqrt{x^2+y^2}}
\\
-\dfrac{y}{\sqrt{x^2+y^2}}&
\dfrac{x}{\sqrt{x^2+y^2}}\end{array}\right),
$$
which happens to be the inverse of the derivative $J\Psi$ but expressed in orthonormal coordinates.
Then the base change with respect to this is
$$e_0=\frac{x}{\sqrt{x^2+y^2}}\ e'_0-\frac{y}{\sqrt{x^2+y^2}}\ e'_1
\quad {\rm and}\quad
e_1=\dfrac{y}{\sqrt{x^2+y^2}}\ e'_0+\dfrac{x}{\sqrt{x^2+y^2}}\ e'_1,
$$
or
$$e_0=\cos\theta\ e'_0-\sin\theta\ e'_1
\quad {\rm and}\quad
e_1=\sin\theta\ e'_0+\cos\theta\ e'_1,
$$
in polar terms.
Now insert those into your linear combination $\vec v=a^0e_0+a^1e_1$ to get
$$\vec v=(a^0\cos\theta+a^1\sin\theta)e'_0
+(-a^0\sin\theta+a^1\cos\theta)e'_1.
$$
You could see how the multiplication
$$\left(\begin{array}{cc}
\cos\theta&\sin\theta\\
-\sin\theta&\cos\theta\end{array}\right)
\left(\begin{array}{c}a^0\\a^1\end{array}\right),
$$
matches how the new components of $\vec v$ are get.
When To Sum Indices
In tensor calculus, changes in vector components due to coordinate transformations are always computed as a linear combination of the derivatives of the new coordinate component $x'^i$ with respect to each coordinate component from its preceding old system $x^j$. This stems from the fact that coordinate conversion equations are usually functions of multiple components from an old coordinate system. According to the Einstein summation convention, indices in differential geometry/linear algebra equations must be summed over each dimensional coordinate if and only if they are repeated within a term (usually in the form of one upper/lower index). Such indices are referred to as dummy indices. This summation convention exists for the sake of compactness since, unsurprisingly, nearly all tensor formulae are expressed as combinations of coordinate components.
Example: line element expansion using the metric tensor $g_{\mu\nu}$: $$ds^2=g_{\mu\nu}dx^\mu dx^\nu=-g_{tt}c^2dt^2+g_{rr}dr^2+g_{\theta\theta}d\theta^2+g_{\phi\phi}d\phi^2$$ The Einstein summation convention is implied for $\mu$ and $\nu$, rendering $\sum$ redundant.
In the case of the following transformation equation:
$$dx'^i=\sum_{j=1}^{n}\frac{\partial x'^i}{\partial x^j}dx^j$$
whose $\sum_j$ can be dropped...
$$dx'^i=\frac{\partial x'^i}{\partial x^j}dx^j\qquad(2)$$
it is implied that $dx^j=\{dx^0, dx^1, dx^2, ..., dx^n\}$ be summed over due to $j$ being a dummy index. On the other hand, the free index $i$ is not summed over since it does not repeat itself. Therefore, $i$ can represent any coordinate component we choose. Our result is the following expansion:
$$dx'^i=\frac{\partial x'^i}{\partial x^1}dx^1+\frac{\partial x'^i}{\partial x^2}dx^2+...+\frac{\partial x'^i}{\partial x^n}dx^n$$
Note: greek indices such as $\mu$ or $\nu$ imply the inclusion of a time component in a summation whereas Latin indices such as $i$ or $j$ merely imply a summation of spatial components.
Example in Practice
Consider the transformation of a vector $\overrightarrow{A}$ from cartesian to polar coordinates whose components (as a linear combination) are:
$$\overrightarrow{A}=3\overrightarrow{e_x}+5\overrightarrow{e_y}+4\overrightarrow{e_z}$$
The conversion relations from $(x,y,z)\rightarrow(r,\theta,\phi)$ read:
$$r=\sqrt{x^2+y^2+z^2},\qquad\theta=\arctan\bigg(\frac{y}{x}\bigg),\qquad\phi=\arctan\bigg(\frac{\sqrt{x^2+y^2}}{z}\bigg)$$
Using eq. (2), we can plug each conversion relation into each partial derivative term.
$$dr=\frac{\partial r}{\partial x}dx+\frac{\partial r}{\partial y}dy+\frac{\partial r}{\partial z}dz$$
$$=\frac{\partial}{\partial x}\bigg(\sqrt{x^2+y^2+z^2}\bigg)dx+\frac{\partial}{\partial y}\bigg(\sqrt{x^2+y^2+z^2}\bigg)dy+\frac{\partial}{\partial z}\bigg(\sqrt{x^2+y^2+z^2}\bigg)dz$$
$$=\frac{x}{\sqrt{x^2+y^2+z^2}}dx+\frac{y}{\sqrt{x^2+y^2+z^2}}dy+\frac{z}{\sqrt{x^2+y^2+z^2}}dz$$
Doing the same with $d\theta$ and $d\phi$, we end up with:
$$d\theta=-\frac{y}{x^2+y^2}dx+\frac{x}{x^2+y^2}dy$$
$$d\phi=\frac{xz}{\sqrt{x^2+y^2}(x^2+y^2+z^2)}dx+\frac{yz}{\sqrt{x^2+y^2}(x^2+y^2+z^2)}dy-\frac{\sqrt{x^2+y^2}}{x^2+y^2+z^2}dz$$
For the sake of accessibility, lets arrange each $\frac{\partial x'^i}{\partial x^j}$ term in a Jacobian matrix:
$$\pmb{J}=\begin{bmatrix} \frac{\partial r}{\partial x} & \frac{\partial r}{\partial y} & \frac{\partial r}{\partial z} \\ \frac{\partial \theta}{\partial x}&\frac{\partial \theta}{\partial y}&\frac{\partial \theta}{\partial z} \\ \frac{\partial \phi}{\partial x}&\frac{\partial \phi}{\partial y}&\frac{\partial \phi}{\partial z} \end{bmatrix}=\begin{bmatrix} \frac{x}{\sqrt{x^2+y^2+z^2}} & \frac{y}{\sqrt{x^2+y^2+z^2}} & \frac{z}{\sqrt{x^2+y^2+z^2}} \\ -\frac{y}{x^2+y^2}&\frac{x}{x^2+y^2}& 0 \\ \frac{xz}{\sqrt{x^2+y^2}(x^2+y^2+z^2)}&\frac{yz}{\sqrt{x^2+y^2}(x^2+y^2+z^2)}&-\frac{\sqrt{x^2+y^2}}{x^2+y^2+z^2} \end{bmatrix}$$
The last step from here is simply to substitute each entry in the Jacobian matrix into the vector transformation eq. (1), with $(x,y,z)\rightarrow(3,5,4)$ as depicted in the aforementioned vector above.
$$A'^i=\frac{\partial x'^i}{\partial x^j}A^j\qquad(1)$$
$$A^r=\frac{\partial r}{\partial x}A^x+\frac{\partial r}{\partial y}A^y+\frac{\partial r}{\partial z}A^z$$
$$=\frac{x}{\sqrt{x^2+y^2+z^2}}A^x+\frac{y}{\sqrt{x^2+y^2+z^2}}A^y+\frac{z}{\sqrt{x^2+y^2+z^2}}A^z$$
$$=\frac{3}{\sqrt{3^2+5^2+4^2}}(3)+\frac{5}{\sqrt{3^2+5^2+4^2}}(5)+\frac{4}{\sqrt{3^2+5^2+4^2}}(4)=5\sqrt{2}$$
I'll leave $A^\theta$ and $A^\phi$ as exercises but they both follow the same procedure as $A^r$.
Why Use This Method Instead of Matrix Rotations?
Methods like eq. (2) generalize coordinate transformations as much as possible for any circumstance, extending beyond merely rotating vectors or coordinates. Coordinate transformations are used all the time in astrophysics and general relativity where physicists create arbitrary new coordinate systems to write tensors (usually spacetime metrics) in a more compact and physically intuitive manner.
Additionally, keep in mind that the variables used in tensor components are not always restricted to dimensional coordinate components. For example, the Kerr-Newman metric tensor (general relativity) describes the curvature of spacetime around a spherically symmetric, charged, rotating mass. In such circumstances, dimensional coordinates alone are simply not a sufficient description since the tensor must also introduce angular momentum and charge as influential geometric factors.
Best Answer
A doubly covariant tensor takes as input two (conventional) vectors, and spits out a scalar. The archetypal example of this is an inner product. If we want to put numbers on how this tensor behaves, the conventional thing is to fix a basis, and feed basis vectors in all possible combinations to the tensor and record the results.
Usually, you would use the same basis for both the input vectors, because it's nicer that way. But there is no formal reason you can't use two bases. It just gets messier than it has to.
Basically the same reasoning applies to mixed covariant-contravariant tensors (i.e. conventional square matrices from linear algebra), and for doubly covariant tensors: It is much more convenient to use the same basis / corresponding dual basis than to use separate, independent bases for the two ranks of the tensor, but it can certainly be done with two independent bases if you're careful.