Consider $\displaystyle{f'(a) = \lim_{h \to 0}\frac{f(a + h) – f(a)}{h}}$.
Applying epsilondelta definition twice we have $\displaystyle{\forall \epsilon > 0, \exists \delta>0 \ni \forall x \in \mathbb R, 0 < x – a< \delta \implies \left\frac{f(a + h) – f(a)}{h} – f'(a)\right < \epsilon}$ and $$\displaystyle{\lim_{h \to 0}\frac{f(a + h) – f(a) – f'(a)h}{h} = \lim_{h \to 0}\frac{\varepsilon(h)}{h} = 0}$$
My book claims the tangent line passing thru $(a, f(a))$ with the slope $f'(a)$ is the best affine approximation to $f$ at $a$ "in the sense that the error goes to $0$ faster than $h$ as $h \to 0$".
My questions:

Do we require $\varepsilon(h)$ to approach $0$ faster than $h$ as $h \to 0$ so that we avoid dividing by $0$?

How do we know $\varepsilon(h)$ approaches $0$ faster than $h$ as $h \to 0$?

Looking at the image below, the placement of the tangent line looks good enough to me. Do we want $\varepsilon(h)$ maximally close to $0$ so that the given tangent line coincides with more points on the curve? Is that a reason for best approximation?
Best Answer
Let me address your second question first. Saying that "$\varepsilon(h)$ approaches $0$ faster than $h$ does" is just an informal way of interpreting the statement $$ \lim_{h\to0}\frac{\varepsilon(h)}{h}=0 \tag{*}\label{*} \, . $$ But let me explain why this informal statement is reasonable.
By definition, $\eqref{*}$ means that given any $d>0$, there is a $\delta>0$ such that if $0<h<\delta$ then $\left\frac{\varepsilon(h)}{h}\right<r$. The statement $\left\frac{\varepsilon(h)}{h}\right<r$ is equivalent to $\varepsilon(h)<rh$. Thus, for instance, there is an interval containing $0$ such that $\varepsilon(h)<0.00000001h$ for all $h\neq0$ in that interval. As you can see, $\varepsilon(h)$ becomes orders of magnitude smaller than $h$.
To answer your first question, when computing the limit of a function at $0$, the actual value of the function at $0$ is irrelevant. This should be evident from the definition of the limit. Thus, there is no division by zero going on here.
To answer your third question, the tangent line is the line that best approximates the function near $a$ in the sense described above, but there is no guarantee that it will approximate the curve well at any other points. For instance, if $f(x)=x^2$ for $0.00000001 \le x \le 0.00000001$ and $f(x)=1000000000000000$ otherwise, then the tangent line at $0$ is a terrible approximation of the function overall. That being said, there are powerful theorems such as the mean value theorem, which can give us "global" information about a function in terms of its derivatives. For MTV to apply, we need stronger hypotheses than differentiability at just one point.