Motivation of the need to use higher order infinitesimal when defining derivative

calculusderivativesmultivariable-calculus

I am trying to derive the concept of derivative and differential from limit and linear approximation for reviewing the subject. And I cant figure the motivation of using higher order infinitesimal as a requirement to define derivative during the middle of it.


Here is what I did

[Step 1]: I start by supposing the only thing I know is the concept of limit. And I should proceed to develop the notion of derivative from the idea of linear approximation

[Step 2]: Assume a single variable real function $f(x)$ as example, suppose $f(x)$ is defined on an interval $[(a-r), (a+r)]$ where $r$ is positive. What I want to do is to estimate the value of any $f(x)$ within this interval by using linear approximation:
$$f(x)=A(x-a)+f(a)+E$$
where $E$ is the error of approximation
$$E=[f(x)-f(a)]-A(x-a)$$

[Step 3]: As the constant $A$ can be chosen randomly, if I want to bring in the notion of derivative, I have to find some sort of motivation that requires me to find a specific $A$ which satisfies
$$\lim\limits_{x\to a}\frac{E}{(x-a)}=0$$
that is to say the requirement for $A$ is that it has to make $E$ a higher order infinitesimal with respect to $(x-a)$.


Question:

[Q]: So what is actually the motivation behind such requirement?

[Q1.1]: I understand from the geometrical perspective it signifies the tangent line, but then what makes the tangent line so special that brings me the motivation to use it as my linear approximation constant $A$ ?

[Q1.2]: Moreover, what is the algebraical motivation behind such requirement without considering the geometrical interpretation ?

Best Answer

I will try to answer this question myself (I don't know if this is the "right" answer, but I will just throw it here as its better than nothing)

What I want is to derive the concept of derivative & differential by only using the concept of limit and linear approximation. As I mentioned in my [Step 3], if I just want to approximate the value of $f(x)$ by the linear equation: $$f(x)=f(a)+A(x-a)+E=f(a)+A\Delta x+E\ \ \ \ \ \ (1)$$ then there is infinite choices of $A$ for me to pick.


So the question becomes: What kind of $A$ do I actually want ? or What kind of $A$ is "nice" enough ?

Now there are two goals of linear approximation (which i want $A$ to satisfy):

  1. I want the error $E$ to be "small" enough so my calculation is accurate even if I choose to ignore $E$ from the equation.
  2. For the sake of predictability and convenience, I want my approximation becomes more accurate when I perform a single operation (or do something), so I can tell someone or a computer how to improve the accuracy during calculation (or when is the approximation not accurate enough).

[Consider the first goal above], "small" with respect to what ? There are three terms on the right side of equation (1), because $f(a)$ is a constant so the only two terms affect the accuracy of my approximation is $A\Delta x$ and $E$, that is to say I want $E$ to be "small" with respect to $A\Delta x$. This means the value of fraction $$\frac{E}{A\Delta x}\ \text{is very small}.$$

[Consider the second goal above], I realize that $\frac{E}{A\Delta x}$ cannot always be very small. What I am looking for is an operation that will make it smaller (or larger) so I can tell someone or a computer what to do (or not to do) to improve the accuracy.

Now as the value of $E$ is dependent on $\Delta x$ for different choices of $x$, the only two options I have would be let $\Delta x \to 0$ or $\Delta x \to \infty$. Because any other options will likely involve letting $\Delta x$ be some kind of complicated function of $x$, and not only it defeats the pupose of linear approximation (for example, assume $\Delta x$ is a parabola function with respect to $x$, then why bother doing linear approximation at the first place ? I should just do a parabolic approximation !) but also this will likely involve more than one operation during approximation (which is not good if there are too many operations a person or a computer needs to carry).

So now I need to evaulate the two operations above:

  1. It is obvious that I cannot guarantee $\frac{E}{A\Delta x}$ to keep becoming smaller when $\Delta x \to \infty$.
  2. It seems possible for me to find a $A$ such that the value of $\frac{E}{A\Delta x}$ keeps decreasing when $\Delta x \to 0$. Which means I should probably look at the following limit $$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$$

With the above two consideration, I should try to develop the requirement of $A$:

Assume $\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$ exists, what I want is $\frac{E}{A\Delta x}$ become smaller as $\Delta x \to 0$. This means at best I should expect this limit goes to zero (and the "nicest $A$" should at least satisfies the value of this limit), that is: $$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}=\frac{1}{A}\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$ $$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$ Now I can say $E$ is a higher order inifitesimal of $\Delta x$. I can then express $E$ with the following equation $$E=\epsilon\Delta x\text{, where} \lim\limits_{\Delta x \to 0}\epsilon=0$$ Substitute $E=\epsilon\Delta x$ back to the equation (1) above, I have $$f(x)=f(a)+A\Delta x+\epsilon\Delta x$$ $$A+\epsilon=\frac{f(x)-f(a)}{\Delta x}$$ And it is not hard to see that $$\lim\limits_{\Delta x \to 0}(A+\epsilon)=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ $$\lim\limits_{\Delta x \to 0}A + \lim\limits_{\Delta x \to 0}\epsilon=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ $$A=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ Now I can define $A$ to be the derivative, and $A\Delta x$, $\Delta x$ to be the differential. I can also claim for each $(x,f(x))$ in the interval, such $A$ is unique due to the uniqueness nature of limit $\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$.

(I can also claim now this unique $A$ is the "nicest $A$" because it is the only one that satisfies the least requirement of the "nicest $A$")

Thus I successfuly bring in the concept of derivative and differential by only using the concept of limit and linear approximation. And the function is said to be differentiable when the limit $$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0\ \ \text{exists}$$

Related Question