Given $f: M \to N$, we use charts $(U, \phi)$ and $(V, \psi)$ for $U,V$ respectively such that $\phi(U) \cap V \not = \emptyset$. If you wish to do calculus in the traditional sense then we use $\tilde{f}: \phi(U) \subset \mathbb{R}^n \to \phi(V) \subset \mathbb{R}^m$ defined by,
$$ \tilde{f} = \psi \circ f \circ \phi^{-1}$$
If you like, you could just write that $\tilde{f}$ is just $f$, how? Well, $\psi(\textbf{x})$ is just the coordinate representation of $\textbf{x}$ in $U$ and if we let $\psi = (y^1,...,y^m)$ then $\psi(f(\textbf{x})) = (y^1(\textbf{x}),...,y^m(\textbf{x}))$ is just the coordinate rep. of $f$ in $V$. Hence, if the use of a chart is understood then we just write,
$$f(x^1,...,x^n) = (y^1,...,y^m)$$
The moral is, in order to make sense of smoothness, we use charts. The idea behind smooth manifold theory is that if you always assume that you manifold is embedded in some ambient space then you're probably deducing things about your space as a result of being in another i.e you've lost focus of the primary object, your manifold.
To counteract this though, you consider your manifold as an abstract space with just a differentiable structure and a few other things, but you never mention a metric or anything of that kind. You now run into a problem if you want to do traditional calculus, but if you make the right identifications there may be a chance. The above is an illustration of some of these identifications.
This is a good question, and an easy misconception!
The differential maps $T_p \mathbb{R} \to T_{f(p)} \mathbb{R}$ for each fixed point $p$. But the tangent space of $\mathbb{R}$ at any point is just $\mathbb{R}$, and this makes it easy to get confused by what is what. It's also confusing that differentials are functions of functions! Remember
$$df_{-} : \mathbb{R} \to \big ( T \mathbb{R} \to T \mathbb{R} \big )$$
That is, for every point $p \in \mathbb{R}$, we have a (distinct!) function
$df_p : T_p \mathbb{R} \to T_{f(p)} \mathbb{R}$. This outer function is allowed to be any smooth function (in particular, it can be highly nonlinear!) it is only the inner function that must be linear.
In particular, say $f(p) = 3p^3$. Then you're exactly right, $df = 9p^2 dx$. But what does this mean?
It means for any individual point $p$ we get a map $df_p : T_p \mathbb{R} \to T_{3p^3} \mathbb{R}$. And what is that map?
$$df_p(v) = (9p^2) v$$
This is just multiplication by a scalar (which is linear)! What's confusing is that the choice of scalar depends (nonlinearly) on $p$.
So, as an example:
- $df_1(v) = 9v$
- $df_2(v) = 36v$
- etc.
In general, if you have a smooth function $f : \mathbb{R} \to \mathbb{R}$, then
$df_p$ is the linear function which scales by $f'(p)$ (which is just a number).
In even more generality, if you have a smooth function $f : \mathbb{R}^n \to \mathbb{R}^m$, then you may remember we have a jacobian matrix $J$ which has functions as its entries.
Then $df = J$ is a matrix of functions, but when we fix a point $p$ we get
$df_p = \left . J \right |_p$ is a matrix with regular old numerical entries. And this $\left . J \right |_p$ is a linear map from $T_p \mathbb{R}^n \to T_{f(p)} \mathbb{R}^m$ (of course, this happens to be the same thing as $\mathbb{R}^n \to \mathbb{R}^m$, but that isn't true for arbitrary manifolds, so it's useful to keep the distinction between $\mathbb{R}^n$ and $T_p \mathbb{R}^n$ in your mind, even though they happen to be the same in this simple case).
Edit:
Let's take a highly nonlinear function like $\sin(x) : \mathbb{R} \to \mathbb{R}$. Afterwards let's take a nonlinear function from $\mathbb{R}^2 \to \mathbb{R}$ so that we can see a matrix as well.
Then $d\sin(x)_p = \cos(p)dx$. So for any fixed point $p$, say $p = \pi$, we get a linear map
$$d\sin(x)_\pi = v \mapsto \cos(\pi) v$$
that is
$$d\sin(x)_\pi = v \mapsto v$$
which is linear.
Indeed, for any point $p$ you'll get a linear map which comes from scaling $v$ by $\cos(p)$ (which, for fixed $p$, is just a number).
So $$d\sin(x)_1 \approx v \mapsto 0.54 v$$
(which is linear).
What about in higher dimensions? Let's look at
$$f(x,y) = x^2y$$
Then $df_{p} = df_{(x,y)}$ is the jacobian:
$$
df_{(x,y)}
= \left [ \frac{\partial}{\partial x} f \quad \frac{\partial}{\partial y} f \right ]
= \left [ 2xy \quad x^2 \right ]
$$
Notice the entries of this matrix are nonlinear in the choice of point
$p = (x,y)$. However, once we fix a point, say $p = (x,y) = (2,3)$:
$$
df_{(2,3)} = [12 \quad 4]
$$
which is a linear map from $T_{(2,3)}\mathbb{R}^2 \to T_{f(2,3)}\mathbb{R}$.
I hope this helps ^_^
Best Answer
The other answer covers it, but I think it is worthwhile to remark that the very definition of differential encodes how $(F_*)_p$ transforms tangent basis vectors, and so how it acts on $T_pN:$
if $p\in U\subseteq N$ and $(U,\phi)$ is a chart, then $\phi_*:T_pU\cong T_pM\to T_p \mathbb R^n$ is an isomorphism (because $\phi$ is a diffeomorphism) and so it makes sense to $\textit{define}$ for each $\ 1\le i\le n,\ \frac{\partial}{\partial x^i}:=\phi^{-1}_*(\frac{\partial}{\partial r^i})$, where $r^i$ are the standard coordinates on $\mathbb R^n$. The same analysis applied to $F(p)\in V\subseteq M$ using the chart $(V,\psi)$ gives tangent vectors $\frac{\partial}{\partial y^j}=\psi^{-1}_*(\frac{\partial}{\partial r^j})$ for $1\le j\le m$.
Then, a direct calculation shows that $(F_*)_p\left (\frac{\partial}{\partial x^i}\right )=\sum ^m_{j=1}\frac{\partial (\psi^j\circ F\circ \phi^{-1})}{\partial r^i}\cdot \frac{\partial}{\partial y^j}$.
The upshot of this is that the matrix of $F_*$ as a map from $T_pN$ to $T_{F(p)}M$ is precisely the Jacobian matrix of the function $\psi\circ F\circ \phi^{-1}$, which is a map between $\textit{Euclidean}$ spaces. And this result was ensured by the definitions.