Question 1: The problem is that if you change the basis, then the dual basis also changes, but in a different way. More precisely, suppose we decide to use the basis $f_i = M e_i$ instead of the basis $e_i$. Then the corresponding dual basis $f_i^{\ast}$ (thinking of $e_i^{\ast}$ as a linear functional $V \to k$) is given by $f_i^{\ast} = e_i^{\ast} M^{-1}$. Indeed the defining property of the dual basis, namely that
$$e_i^{\ast}(e_j) = \delta_{ij}$$
(where $\delta_{ij} = 1$ if $i = j$ and is $0$ otherwise) is satisfied here, since
$$f_i^{\ast}(f_j) = e_i^{\ast} M^{-1} M e_j = e_i^{\ast} e_j = \delta_{ij}.$$
Note that $M^{-1}$ acts on the right instead of on the left, so if one insists on writing linear transformations as matrices acting on the left, then we need in addition to take the transpose.
This is a reflection of the fact that taking dual spaces is a contravariant functor rather than a covariant one.
Question 2: For $V$ finite-dimensional, specifying an isomorphism $V \to V^{\ast}$ is equivalent to specifying a nondegenerate bilinear form $V \times V \to k$. This form need not be symmetric or an inner product in general. (In fact the notion makes sense over an arbitrary field, and an arbitrary field doesn't have a notion of positivity.)
I think the following article:
Gregory H. Moore. The axiomatization of linear algebra: 1875-1940. Historia Mathematica, Volume 22, Issue 3, 1995, Pages 262–303
(Available here from Elsevier)
may shed some light on your question, although you may not have enough mathematical experience to understand the entire article. Here is my understanding having browsed the article, but I must stress that I am not a mathematical historian, so please don't quote me!
The idea of an abstract space where an addition is defined between elements and there is a field action (rather than a particular realization as, for instance, $\mathbb{R}^n$ or $C([0,1])$) seems to be due to Peano in 1888, where he called them linear systems. The definition of an abstract vector space didn't catch on until the 1920s in the work of Banach, Hahn, and Wiener, each working separately. Hahn defined linear spaces in order to unify the theory of singular integrals and Schur's linear transformations of series (both employing infinite dimensional spaces). Wiener introduced vector systems which seems to be roughly equivalent to Banach's definition, which was motivated by finding a common framework to understand integral operators (Banach's 1922 paper "Sur les operations dans les ensembles abstraites et leur application aux équations intégrales" is available online and is quite readable) which were defined on champs (domains).
I understand the modern name vector space is popular because of a widely circulated 1941 textbook by Birkhoff and MacLane, A Survey of Modern Algebra, where the term is used.
As Asaf and Hans have indicated in their comments, the motivation for calling such spaces vector spaces is because intuitively, they generalize our understanding of "vectors" (differences between points) in a finite dimensional Euclidean. The motivation for calling such spaces linear spaces is because our ability to add together different elements is the crucial feature which lets us apply the general theory to solve specific problems which are not obviously (to the 1920's eye) about vectors (in particular, in PDE and mathematical physics).
In your course, it is unlikely you will cover material that requires this abstraction, but it is a good habit for later mathematics to work in generality while you maintain your intuition in concrete examples.
Best Answer
3) Of course you can define natural isomorphisms without using the language of category theory, but category theory was (in part) invented in order to efficiently express this notion, so there is little reason to (apart from trying to improve understanding).
2a) Philosophically, what makes an isomorphism between two objects natural is that constructing the isomorphism does not require more information than constructing the objects.
For example, in order to construct $V^*$ or $V^{**}$, it is enough to know that $V$ is a vector space. What I mean is that in order to build the two sets $V^*=\{f\colon V\to\mathbb F\mid f\text{ is linear}\}$, $V^{**}=\{f\colon V^*\to\mathbb F\mid f\text{ is linear}\}$, and to give them the structure of vector spaces, you need know nothing more than what a vector space is (hence what "linear" is), and that $V$ is a vector space.
Most likely in your book, to construct an injective linear map $\phi\colon V\to V^*$, you used additional information about the vector space $V$: probably a choice of basis. For $\phi$ to also be surjective, and hence an isomorphism, you need even more information: that the basis was finite*.
On the other hand, in order to construct the linear map $\psi_V\colon V\to V^{**}$, however, you don't need any extra information other than that $V$ is a vector space. You simply define $\psi_V(v)\in V^{**}$ by specifying how the $\psi(v)$ acts on elements $f\in V^*$: you declare $\psi_V(v)(f)=f(v)$. (To know that it is an isomorphism, you still need the extra information that $V$ is finite-dimensional, otherwise $\psi_V$ is just injective).
2b) Mathematically, the fact that an isomorphism is natural, i.e. does not depend on structural information beyond that contained in the objects, is captured by defining an isomorphism to be natural if its construction is preserved by structure-preserving (e.g. linear) maps. This is most easily expressed using the language of category theory (since the language of category theory was invented to express this).
Concretely, you can see that the isomorphism between $V$ and $V^*$ is not natural since the isomorphism based on $\{e_1,\dots,e_n\}$ likely defines the dual basis $e_i^*(\sum a_je_j)=a_i$. But all this is doing is defining an inner-product (non-degenerate bilinear form if the base field $\mathbb F\neq\mathbb R$) $\left<\cdot,\cdot\right>$ on $V$ such that $\left<e_i,e_j\right>=\delta_{ij}=\begin{cases}0&i\neq j\\1&i=j\end{cases}$. The basis (when $\mathbb F=\mathbb R$) allows us to identify $V$ as our usual $n$-dimensional space with coordinates, the inner product becomes the dot product, and the isomorphism to $V^*$ sends a vector $\vec v$ to the functional that projects orthogonally onto $\vec v$. Then the only linear transformations of $V$ to itself that preserve the isomorphism to $V^*$ are the orthonormal transformations, ones that preserve the length and (unsigned) angle between vectors. All other transformations of $V$ to itself break the isomorphism, and hence we can conclude the isomorphism is not natural.
1) The use of natural transformations is that they ensure that the maps you are constructing reflect genuine properties of the mathematical object, rather than being an artifact of the specific way in which you present the object, and hence a consequence of properties that the object "in-and-of-itself" does not have.
*More generally, the extra information for $\phi$ that you need is that of a bilinear form, which the basis allows you to define, and for $\phi^{-1}$ you need the bilinear form to be non-degenerate, which is what finiteness of the basis allows you to build. However, there are other ways to build non-degenerate bilinear forms, e.g. $L^2$ inner products.
EDIT: in response to comments, note that it makes no sense to talk about natural isomorphisms between "unnatural constructions". I said that philosophically, a natural construction is one that depends on no extra information. Mathematically, it is a functor: in addition to telling us how to construct a new object $F(V)$ out of an old object $V$, a natural construction also tells us how the construction behaves if we are given a structure-preserving map $f\colon V\to W$, by giving a structure-preserving map $F(f)$ between $F(V)$ and $F(W)$.
There are two types of natural constructions: covariant and contravariant, depending on whether $F(f\circ g)=F(f)\circ F(g)$ or $F(f\circ g)=F(g)\circ F(f)$. In particular, one shows the construction of $V^*$ is (contravariantly) natural by defining/constructing, for any $f\colon V\to W$, an $f^*\colon W^*\to V^*$ given by $f^*\colon g\mapsto g\circ f$ for any $g\colon W\to\mathbb F$, and showing that $(f_1\circ f_2)^*(g)=g\circ f_1\circ f_2=(f_2^*\circ f_1^*)(g)$.
Now, a natural transformation between $I$ (the identity construction $I(V)=V$ and $I(f)=f$ are isomorphisms $\phi_V\colon I(V)\to F(V)$ such that: $ \require{AMScd} \begin{CD} I(V) @>{\phi_V}>> F(V)\\ @VfVV @VVF(f)V \\ I(W) @>{\phi_W}>> I(W) \end{CD}$ for covariant $F$ or $\begin{CD} I(V) @>{\phi_V}>> F(V)\\ @VfVV @AAF(f)A \\ I(W) @>{\phi_W}>> I(W) \end{CD}$ for contravariant $F$. Now it is easy to see that there is no natural isomorphism between $V$ and $V^*$. For suppose that there were, so that we have $\begin{CD} V @>{\phi_V}>> V^*\\ @VfVV @AAf^*A \\ W @>{\phi_W}>> W^* \end{CD}$ for every $f\colon V\to W$. Then clearly, $\phi_V^{-1}\circ f^*\circ\phi_W\circ f$ would be the identity, so $f^{-1}\colon W\to V$ would be given by $\phi_V^{-1}\circ f^*\circ\phi_W$. But not every linear map is an isomorphism (in particular, the $f(v)=0$ is a linear map that is not an isomorphism for any choice of $V$ and $W$), so contradiction.