I don't think that "contravariant transformation" is established terminology in physics.
The problem with "covariant" is that in physics, this has a wide range of meanings, starting with "involving no unatural choices" up to the definition one sees in differential geometry motivated by general relativity, which is:
For a smooth real Riemann manifold $M$, a tensor $T$ of rank $\frac{n}{m}$ is a linear function which takes n 1-forms and m tangent vectors as input. When you choose a coordinate chart and dual bases on the cotangential space $d x_n$ and on the tangential space $\partial_n$ with respect to this chart, then the tensor has coordinate functions of the form
$$
T^{\alpha, \beta, ...}_{\gamma, \delta,...} = T(d x_{\alpha}, d x_{\beta}..., \partial_{\gamma}, \partial_{\delta}...)
$$
With respect to these bases, a downstairs index is called covariant, an upstairs index is called contravariant. Now, a "covariant equation" or "covariant operation" is one that does not change its form on a coordinate change, which means that if you change coordinates and apply the choordinate change to all covariant and contravariant indices of every tensor in your equation, then you have to get the same equation, but with "indices with respect to the new coordinates".
A simple example would be:
$$
T^{\alpha}_{\alpha} = 0
$$
with the Einstein summation convention: When the same index is used for a covariant and a contravariant index, it is understood that one should sum over all indices of a pair of dual bases.
Physicists would say that this equation is "covariant" because it has the same form in every coordinate chart, i.e. when I apply a diffeomorphism I get
$$
T^{\alpha'}_{\alpha'} = 0
$$
with respect to the new coordinates. Note that since we talk about general relativity, the kind of transformations are implicitly fixed to be changes of charts on a smooth real manifold. As I said before, when physicists talk about different theories, they may implicitly talk about other kinds of transformations. (Maybe you ran into some physicists who said "covariant transformation" when they meant "coordinate change", but me personally, I have not encountered this use of language.)
The "covariant" in "covariant derivative" should really be "invariant". It is a misnomer, but we are stuck with it. It is not the same "covariant" as that of a "covariant vector", and therefore, there is no "contravariant derivative". Armed with this, Wikipedia should fill in the rest for you :)
Best Answer
The notions "covariant" and "contravariant" are rather old. They are tied to the coordinate representation of vectors with respect to a basis of the underlying vector space.
Let $\mathcal B = (b_1,\ldots, b_n)$ be an ordered basis of $V$. The dual basis $\mathcal B^* = (b_1^*,\ldots, b_n^*)$ is given by the linear maps $b_i^* : V \to \mathbb R, b_i^*(b_j) = \delta_{ij}$. Then any $T : V \to \mathbb R$ can be written uniquely as $T = \sum T_i b_i^*$ where $T_i = T(b_i)$. In your question you write $b_i^* = \theta^i$. For the sake of transparence let us write $T_i = T_i(\mathcal B)$and $T(\mathcal B) = (T_1(\mathcal B),\ldots,T_n(\mathcal B)) \in \mathbb R^n$. The latter is the coordinate representation of $T$ with respect to the basis $\mathcal B^*$ of $V^*$.
If $\mathcal C = (c_1,\ldots,c_n)$ is another ordered basis of $V$, then there exists a unique (invertible) matrix $A = (a_{ij})$ such that $$c_i = \sum_j a_{ij}b_j .$$ $A$ is the transformation matrix of the change of basis $\mathcal B \mapsto \mathcal C$. Note that if $A^{-1} = (a'_{ij})$, then $$b_i = \sum_j a'_{ij}c_j .$$ That is, $A^{-1}$ is the transformation matrix of the inverse change of basis $\mathcal C \mapsto \mathcal B$.
With respect to the new basis $\mathcal C$ We have $T = \sum T_i(\mathcal C) c_i^*$. What is the relation between $T(\mathcal C) = (T_1(\mathcal C),\ldots,T_n(\mathcal C))$ and $T(\mathcal B) = (T_1(\mathcal B),\ldots,T_n(\mathcal B))$? We have $$T_i(\mathcal C) = T(c_i) = T(\sum_j a_{ij}b_j) = \sum_j a_{ij}T(b_j) = \sum_j a_{ij}T_j(\mathcal B) .$$ That is, the transformation formula for a change of basis $\mathcal B \mapsto \mathcal C$ of $V$ and the induced transformation formula for $T(\mathcal B) \mapsto T(\mathcal C)$ are the "same", i.e. have the same transformation matrix. This means that the coordinate representation $T(\mathcal B)$ covaries with $\mathcal B$, and it is the reason why $T$ is called covariant.
What about a tensor $\tilde T : V^* \to \mathbb R$? As Michael Seifert says in his answer, we have $\tilde T \in V^{**}$ and $V^{**}$ can be identified naturally with $V$. Let us nevertheless do it a bit more formally. Let $\mathcal B^{**}$ be the dual basis for $\mathcal B^*$. Then $\tilde T = \sum \tilde T^i b_i^{**}$. Let us write $\tilde T^i = \tilde T^i(\mathcal B)$ and $\tilde T(\mathcal B)= (\tilde T^1(\mathcal B),\ldots,\tilde T^n(\mathcal B))$. A change of basis $\mathcal B \mapsto \mathcal C$ of $V$ induces a change of basis $\mathcal C^* \mapsto \mathcal B^*$ as follows: $$b_i^* = \sum_j a_{ji}c_j^*$$ because $$\sum_j a_{ji}c_j^*(b_k) = \sum_j a_{ji} c_j^*(\sum_l a'_{kl}c_l) = \sum_{j,l} a_{ji} a'_{kl} c_j^*(c_l) = \sum_j a_{ji} a'_{kj} = \sum_j a'_{kj} a_{ji} = \delta_{kj} = \delta_{jk}.$$ That is, the transformation matrix of $\mathcal C^* \mapsto \mathcal B^*$ is the transposed matrix $A^t$. Hence the transformation matrix of $\mathcal B^* \mapsto \mathcal C^*$ is $\tilde A= (A^t)^{-1} = (A^{-1})^t$. By the above considerations we see that the transformation matrix of $\tilde T(\mathcal B) \mapsto \tilde T(\mathcal C)$ is also $\tilde A$. This means that the coordinate representation $\tilde T(\mathcal B)$ contravaries with $\mathcal B$, and it is the reason why $\tilde T$ is called contravariant.
Due to the natural identification $V^{**} \approx V$ the behavior of $\tilde T(\mathcal B)$ is the same as that of the coordinate representation of vectors of $V$. In fact, for $x \in V$ write $x = \sum_i x_i(\mathcal B) b_i$. Then $x(\mathcal B) = (x_1(\mathcal B),\ldots,x_n(\mathcal B))$ is the coordinate representation of $x$ with respect to $\mathcal B$. We get $$x = \sum_i x_i(\mathcal B) b_i = \sum_i x_i(\mathcal B) \sum_j a'_{ij} c_j = \sum_{i,j} x_i(\mathcal B) a'_{ij} c_j = \sum_j \left(\sum_i a'_{ij}x_i(\mathcal B) \right)c_j \\= \sum_i \left(\sum_j a'_{ji}x_j(\mathcal B) \right)c_i = \sum_i x_i(\mathcal C) c_i $$ and therefore $$x_i(\mathcal C) = \sum_j a'_{ji}x_j(\mathcal B) .$$ That is, the transformation matrix of $x(\mathcal B) \mapsto x(\mathcal C)$ is $\tilde A$, i.e. the coordinate representation $x(\mathcal B)$ contravaries with $\mathcal B$.