By differentiating implicitly with respect to $x$ both sides of the implicit equation
$$
\begin{equation*}
2x^{2}+y^{2}=33,\tag{1}
\end{equation*}
$$
since the derivatives of both sides should be equal we get successively:
$$
\begin{eqnarray*}
&&\frac{d}{dx}\left( 2x^{2}+y^{2}\right) =\frac{d}{dx}\left( 33\right) \\
&\Rightarrow &\frac{d}{dx}\left( 2x^{2}+y^{2}\right) =0 \\
&\Leftrightarrow &\frac{d}{dx}\left( 2x^{2}\right) +\frac{d}{dx}\left(
y^{2}\right) =0 \\
&\Leftrightarrow &4x+2y\frac{dy}{dx}=0,\qquad \frac{d}{dx}\left(
y^{2}\right) =2y\frac{dy}{dx}\text{ by the chain rule} \\
&\Leftrightarrow &\frac{dy}{dx}=-\frac{4x}{2y}=-\frac{2x}{y}\tag{2} \\
&\Rightarrow &\left. \frac{dy}{dx}\right\vert _{x=2,y=5}=-\frac{4}{5}.\tag{3}
\end{eqnarray*}
$$
The equation of the tangent line at $(2,5)$ is
$$
\begin{equation*}
y-5=-\frac{4}{5}(x-2),\tag{4}
\end{equation*}
$$
while the equation of the normal line to the curve $2x^{2}+y^{2}=33$ at $(2,5)$ is
$$
\begin{equation*}
y-5=\frac{5}{4}(x-2)\Leftrightarrow y=\frac{5}{4}x+\frac{5}{2},\tag{5}
\end{equation*}
$$
because the slope $m$ of the tangent line and the slope $m^{\prime }$ of the normal line are related by $mm^{\prime }=-1$.
ADDED. In a more general case, when we have a differentiable implicit
function $F(x,y)=0$, let $y=f(x)$ denote the function such that $F(x,f(x))\equiv 0\quad$ ($f(x)$ does not need to be explicitly known). If we differentiate both sides of $F(x,y)=0$ and apply the chain rule, we get the following total derivative with respect to $x$:
$$\frac{dF}{dx}=\frac{\partial F}{\partial x}+\frac{
\partial F}{\partial y}\frac{dy}{dx}\equiv 0.\tag{A}$$
Solving $(\mathrm{A})$ for $\frac{dy}{dx}$, gives us the following formula
$$\frac{dy}{dx}=-\frac{\partial F}{\partial x}/\frac{
\partial F}{\partial y}.\tag{B}$$
In 2 and 3 dimensions we can observe from example that the derivative satisfies what is required of a tangent vector. If you understand why is that it works in the case $f:R \rightarrow R$, then you will get why it works in the case for 2 and 3 dimensions. For higher dimensions this is just a generalization as you cannot visualize 4d space or higher (I can hardly visualize 3d).
Best Answer
It does not have to be - we want it to be so.
It is a definition that the derivative is a linear map. So the question is more "Why is the notion of linear approximation so interesting, that it deserves such a central place?". The answer to this is, that linear maps are fairly simple to understand while they still are fairly general.
If you choose simpler approximations, e.g. you only allow maps of the form $x\mapsto \lambda*x$ for some scalar $\lambda$ as "derivatives", many functions would not be "differetiable" anymore.
If you choose more complicated maps, e.g. you allow for maps like $x\mapsto Ax + B|x|$ with a componentwise absolute value (so the "derivative" would be a pair $(A,B)$), you will have some more functions "differentiable" but it is far from clear how this notion will be of any help.
So, linear maps seem to be a perfect balance between simplicity and generality. You see this in action, e.g. if you see Newton's method in higher dimension in action or analyze non-linear systems of differential equations by means of their local linearizations.
(Another aspect: For functions of a complex variable there are two notions of differentiability. You can consider real linearity which gives differentiability in the sense of mappings from two dimensional real space to itself. The other possibility is to consider complex linearity and this leads to holomorphic functions. This gives a lot of extra rigidity and leads to more restrictive but also powerful notion of derivative.)