Okay, there are two facts to be taken into account here:
Vectors are elements of a vector space. (Let's say a real, d-dimensional vector space $V$ for concreteness). If you use a basis $ \lbrace e_i \rbrace \subseteq V $ you can express those vectors as a linear combination of elements. ie: for any $v \in V$:
$$v = v^{i} e_{i} $$
where $v^{i}$ are real numbers, called the components of $v$ with respect to $\lbrace e_{i} \rbrace$.
Covectors are elements of the dual space $V^{*}$ of $V$. You can also choose a basis $\lbrace \epsilon^{i} \rbrace \subseteq V^{*}$ to express these objects as linear combinations in a similar fashion. ie: for any $\omega \in V^{*}$:
$$\omega = \omega_{i}\epsilon^{i}$$
where $\omega_{i}$ are also real numbers, called the components of $\omega$ with respect to $\lbrace \epsilon^{i} \rbrace$.
The first choice.
In principle, the bases $\lbrace e_{i} \rbrace$ and $\lbrace \epsilon^{i} \rbrace$ are not related in any way. However, in order to simplify calculations, we often choose a very special basis for the dual space:
The dual basis is the unique basis in the dual space such that:
$$\epsilon^{i}(e_{j}) = \delta^{i}_{j}$$
(Note the two different workings of the word "dual": while dual space it means the space $V^{*} := Hom(V,\mathbb{R})$, dual basis refers to the uniquely defined basis on $V^{*}$ such that $\epsilon^{i}(e_{j}) = \delta^{i}_{j}$).
Now suppose we want to make a change of basis (AKA linear transformation) in $V$. Let's say our vector $v = v^{i}e_{i}$ can be written in terms of the new basis $\lbrace a_{i} \rbrace$ as:
$$v = v^{i} e_{i} = w^{j}a_{j}$$
While the basis transforms with a certain matrix: $e_{i} = \Lambda^{j}_{i} a_{j} $, the components with respect to that basis transform with the inverse of that matrix: $v^{i} = w^{j} (\Lambda^{-1})^{i}_{j} $
Now, when we pair up the elements of the basis $\lbrace e_{i}\rbrace$ and the elements of the basis $\lbrace \epsilon^{i}\rbrace$ we would like our dual basis convention to remain true, so any change in basis $\Lambda$ on $V$ will induce a change of basis on $V^{*}$.
$$\delta^{i}_{j} = \epsilon^{i}(e_{j}) = \epsilon^{i}(\Lambda^{k}_{j}a_{k}) = \Lambda^{k}_{j}\epsilon^{i}(a_{k}) = ...$$
The matrix that relates $\lbrace \epsilon^{i} \rbrace$ with the new basis (let's call it $\lbrace \alpha^{i} \rbrace$) on the dual space needs to be the inverse $\Lambda^{-1}$ of $\Lambda$ in order to satisfy the relation $\alpha^{i}(a_{j}) = \delta^{i}_{j}$.
$$...=\Lambda^{k}_{j}\epsilon^{i}(a_{k}) = \Lambda^{k}_{j}(\Lambda^{-1})^{i}_{l}\alpha^{l}(a_{k}) = \Lambda^{k}_{j}(\Lambda^{-1})^{i}_{l} \delta^{l}_{k}$$
$$=\Lambda^{k}_{j}(\Lambda^{-1})^{i}_{k} = \delta^{i}_{j}$$
The second choice:
We use the word "covariant" to describe the way the basis of $V$ transforms.
From this, we start calling "contravariant" the way the basis of $V^{*}$ transforms, because it needs to use the inverse transformation in order to keep the duality convention.
We call the components of the elements of $V$ "contravariant" because, as we saw before, they need to transform inversely to the basis of $V$ in order to keep invariance.
Finally, we call "covariant" the components of the elements of $V^{*}$, because they need to transform with inversely the basis of $V^{*}$ and since their basis transforms contravariantly, they end up transforming with the same matrix as the basis of $V$.
So, in summary:
What if it was the other way around in our convention ? So, now their way of transforming will get changed.
No. Only the way we call the transformation behaviours would change. What matters is:
- The components of the elements of a vector space need to transform in the opposite way as the basis of that vector space does.
- The basis of $V$ transforms in the opposite way as the basis if $V^{*}$ does, in order to keep the duality convention.
Then what sense does it make to distinguish vectors like these? A vector is simply an element of the vector space. How it transforms depends on the basis you chose for that space, and not on the nature of the vector.
No. The components of a vector will always transform with the opposite transformation to the one that transformed the basis, regardless of what specific basis is that.
Then what does it mean that the gradient is a covariantvector ? Now saying because it transforms in a certain way makes no sense.
The gradient of a function has covariant components because it naturally is a map $TM \rightarrow \mathbb{R}$. It takes a vector and gives you the directional derivative of the function in that direction. So it is an element of $T^{*}M$ (the dual space to $TM$), whose basis has a transformation behaviour (contravariant) opposite to that of the basis of $TM$ (covariant).
To understand, WLOG let's take a basis change given by
$$b_1={B^1}_1e_1+{B^2}_1e_2$$
$$b_2={B^1}_1e_1+{B^2}_2e_2$$
where $\{e_1,e_2\}$ is an old basis and $\{b_1,b_2\}$ is the new.
Relation which can be succintily
expressed as $b_i={B^s}_ie_s$
(here we see how bases covariate ).
Agree that the matrix of such data is
$$[B]=\begin{bmatrix}
{B^1}_1, {B^1}_2\\
{B^2}_1, {B^2}_2\\
\end{bmatrix}$$
Then, to get the new components of a vector
$v=v^1e_1+v^2e_2$,
you will see
that $$v_b=[B]^{-1}v_e,$$
(here we see how components contravariate ),
$v_e$ is a column arrange from the old components; $v_b$ is the data on the new components of the very same vector $v$.
Unfolded is
$$
\begin{bmatrix}
w^1\\
w^2\\
\end{bmatrix}
\ =\
\begin{bmatrix}
{B^1}_1, {B^1}_2\\
{B^2}_1, {B^2}_2\\
\end{bmatrix}^{-1}
\begin{bmatrix}
v^1\\
v^2\\
\end{bmatrix}$$
such that $v=w^1b_1+w^2b_2$ in the new basis.
Take an explicit example to illuminate even more:
Let
$$b_1=e_1+2e_2,$$
$$b_2=e_1+3e_2,$$
be a basis change. Its change-of-basis matrix is
$[B]=
\begin{bmatrix}
1& 1\\
2&3\\
\end{bmatrix}
$.
Now solving for $e_i$ we get
$$e_1=3b_1-b_2,$$
$$e_2=-2b_1+b_2.$$
Which substitution on $v$ gives:
$$v=v^1(3b_1-b_2)+v^2(-2b_1+b_2).$$
This simplifies into
$$v=(3v^1-2v^2)b_1+(-v^1+v^2)b_2.$$
Now follow with your eyes the $[B]^{-1}v_e$ product:
$$
\begin{bmatrix}
3&-1\\
-2&1\\
\end{bmatrix}
\begin{bmatrix}
v^1\\
v^2\\
\end{bmatrix}
=
\begin{bmatrix}
3v^1-v^2\\
-2v^1+v^2\\
\end{bmatrix}.
$$
Best Answer
Let us use upper indices to index row(-vector)s and lower index for column(-vector)s.
Let $x$ be an arbitrary vector, and $[x]$ its coordinates in the standard basis.
We will also consider two other bases $\{v_i\}$, $\{w_i\}$ and later the corresponding dual bases for the dual space of linear functionals (or 1-forms, or covectors).
Let $V=\left([v_1]\mid [v_2]\mid\ldots [v_n]\right)$ be the matrix made of coordinate columns-vectors of the first basis, and similarly $W$ for the second basis $\{w_i\}$.
If the coordinates of a vector in the two bases are related by $$ [x]_w=T[x]_v\tag{1}, $$ then, identifying the left-most and the right-most side in $V[x]_v=[x]=W[x]_w=W(T[x]_v)=(WT)[x]_v$, we have
$$V=WT\tag{2}$$
Let $V^{\prime}=\left(\begin{smallmatrix} [v^1]\\ [v^2] \\ \cdots\\ [v^n]\end{smallmatrix}\right)$ be the matrix of stacked row-vectors of the first dual basis, and similarly $W^{\prime}$ for the second dual basis $\{w^i\}$.
Using the dual basis we can write $[x]_v=\left(\begin{smallmatrix} v^1(x)\\ v^2(x) \\ \cdots\\ v^n(x)\end{smallmatrix}\right)=V^{\prime}[x]$ and similarly for the $[x]_w$.
From $TV^{\prime}[x]=T[x]_v=[x]_w=W^{\prime}[x]$ we have $$W^{\prime}=TV^{\prime}\tag{3}$$
Finally, for any dual vector (co-vector) $\alpha$ we can write $[\alpha]=[\alpha]_{w^{\prime}}W^{\prime}=[\alpha]_{w^{\prime}}TV^{\prime}$ and therefore, $$[\alpha]_{v^{\prime}}=[\alpha]_{w^{\prime}}T\tag{4}$$
The (1)-(4) tell us in which direction and from which side to apply the transformation matrix to the coordinates of vectors, covectors, and their bases.