The biggest draw back (and it's a big one) is that the ring of dual numbers is not a field. It has plenty of zero divisors. So, Newton, or any of the mathematicians of the early days of calculus, certainly did not work directly in the ring of dual numbers. They of course did not consider the ring to exist (as rings did not exist at all yet), but from their writing it is clear they envisaged a field of real numbers with, somehow, some notions of infinitesimals. Their work is of course very vague, but correct. Much more on that can be found in math history books. Many interesting discussions can be found in the recent book "Adventures in Formalism", also related to the early days of calculus and how things developed.
Some (rather unsatisfactory) portions of analysis can be developed in the ring of dual numbers, but it does not go too far. The idea, as you say, is very simple, perhaps too simple. One immediately gets into trouble when trying to define the derivative as the quotient of the infinitesimal $f(x+h)-f(x)$ divided by $h$, where $h$ is infinitesimal. The difficulty is that the non-zero infinitesimals in the ring of dual numbers are not invertible. So, it's the end of the party. (As you say though, some aspects of the party remain with automatic differentiation). In some sense, the dual numbers form a first order approximation to actual infinitesimals: The square of an infinitesimal is of an order of magnitude smaller than the infinitesimal you started with, but in the ring of dual numbers, the square of an 'infinitesimal' is precisely $0$. So, in a nonstandard model of the reals you have whole layers of infinitesimals. In the dual numbers there is only one layer, nothing in it is invertible, and they all square to $0$.
The book Models for smooth infinitesimal analysis explores many different models for analysis with infinitesimals. None of them is particularly simple.
1. Whether the classical and non-standard definitions of continuity and uniform continuity are the same, rigorously speaking, or just extremely similar?
The classical and nonstandard definitions of continuity are the same, for real functions, in the sense that a function $f: \mathbb{R} \rightarrow \mathbb{R}$ satisfies the usual standard continuity condition precisely if its $\:^\star$-extension $\:^\star\! f : \:^\star\mathbb{R} \rightarrow \:^\star\mathbb{R}$ satisfies the nonstandard continuity condition presented in the linked article.
Similarly, $f: \mathbb{R} \rightarrow \mathbb{R}$ satisfies the usual standard uniform continuity condition precisely if its $\:^\star$-extension $\:^\star\! f : \:^\star\mathbb{R} \rightarrow \:^\star\mathbb{R}$ satisfies the nonstandard uniform continuity condition presented in the linked article.
However, the usual definitions of continuity and microcontinuity do not coincide for arbitrary hyperreal functions $g : \:^\star\mathbb{R} \rightarrow \:^\star\mathbb{R}$. This happens because most such functions are not $\:^\star$-extensions of any real function. Assuming that your usual standard definition of continuity and your definition of microcontinuity are both phrased in such a way that they make sense for hyperreal functions, you get examples of hyperreal functions that are continuous in the usual standard sense but not in the nonstandard sense (the indicator function of the non-real hyperreal numbers) and vice versa (the indicator function of the rationals multiplied by an infinitesimal).
I cannot imagine a counterexample of a real function which is classically continuous but not microcontinuous!
This one's easy to settle. Consider the function $f: \mathbb{R} \rightarrow \mathbb{R}$ defined by the equation $f(x) = x^2$. Take your favorite infinite hypernatural $\omega$. This hypernatural is positive, and larger than any natural number. Since if $x \leq y$ then $\frac{1}{y} \leq \frac{1}{x}$ for all $x$, $y$, we get that $\frac{1}{\omega}$ is a positive hyperreal that is smaller than any positive natural number, and is therefore infinitesimal.
So we have that $\omega$ is infinitesimally close to $\omega + \frac{1}{\omega}$. Is $f(\omega)$ infinitesimally close to $f\left(\omega + \frac{1}{\omega}\right)$? Well, $\left(\omega + \frac{1}{\omega}\right)^2 - \omega^2 = \omega^2 + 2\omega\frac{1}{\omega} + \frac{1}{\omega^2} - \omega^2 = 2 + \frac{1}{\omega^2} > 2$, which is therefore not infinitesimal. So $f(\omega)$ and $f\left(\omega + \frac{1}{\omega}\right)$ are not infinitesimally close, and consequently $f$ is not microcontinuous, since if you take $x=\omega$ and $x' = \omega + \frac{1}{\omega}$, then the premise of the implication is satisfied, but the conclusion is not. And this same argument shows that the real function $f$ is not uniformly continuous on the real line.
Which brings us to uniform continuity. You write:
This seems to align itself fairly well with the standard uniform continuity definition: $\forall x,y\in I,\,\forall\epsilon\gt0,\,\exists\delta\gt0:\|x-y\|\lt\delta\implies\|f(x)-f(y)\|\lt\epsilon$
This is not the definition of uniform continuity at all. You need $$\forall \varepsilon > 0. \exists \delta > 0. \forall x, y \in I. |x - y|<\delta \rightarrow |f(x)-f(y)| <\varepsilon$$
instead. Your reasoning, that "the counterexample of a continuous-but-not-microcontinuous function is a function that is not uniformly continuous anywhere" does not work. In fact, uniform continuity is not a local property: a continuous real function is uniformly continuous on every closed and bounded real interval, so it doesn't make much sense to talk about a continuous function that is not uniformly continuous anywhere. This has nothing to do with nonstandard analysis: it should be familiar from introductory courses on standard real analysis.
2. How can a function ever be continuous everywhere but not microcontinuous anywhere?
There is no such real function. If a real function is continuous, then it is microcontinuous around every real point. But possibly not around non-real hyperreal points (see the squaring function above).
There are such hyperreal functions, though (provided that you define microcontinuity appropriately for such functions). Consider e.g. the function $f : \:^\star\mathbb{R} \rightarrow \:^\star\mathbb{R}$ that takes the value $0$ on every real number, but takes the value $1$ on every non-real hyperreal number. For any $x \in \mathbb{R}, \varepsilon \in \mathbb{R}^+$, there is clearly a $\delta \mathbb{R}^+$ so that for all $y \in \mathbb{R}$, if $|x - y| < \delta$ then $|f(x) - f(y)| = |0 - 0| = 0 < \delta$. So $f$ is continuous. But if you take $x = 0$ and any non-zero infinitesimal $x' \approx 0$, then $1 = f(x') \not\approx f(0) = 0$. So $f$ is not microcontinuous.
Whether or not there exists analogous definitions of most standard analysis ideas of convergence, continuity and limits (with all the typical extra statements of uniform, weak, strong, absolute,...) in the nonstandard world?
There are analogous definitions of all these. You can find them in any nonstandard analysis textbook, e.g. Goldblatt's Lectures on the Hyperreals.
Best Answer
On the contrary, there is no actual difference between the definition of differentiability using hyperreal numbers, and the IST definition of S-differentiability.
Before I explain why, let me restate the NSA definition of differentiability in a more precise manner: the version you state employs some abuses of notation which makes the correspondence harder to understand. I will call it H-differentiability, to distinguish it from the classical (limit-based) definition of differentiability.
Definition. Consider a real function $f: \mathbb{R} \rightarrow \mathbb{R}$, and a real number $a \in \mathbb{R}$. We say that $f$ is H-differentiable at the point $a$ if we can find a real number $m \in \mathbb{R}$ so that for the unique hyperreal-valued $\star$-extension $~^\star\! f: ~^\star\!\mathbb{R} \rightarrow ~^\star\!\mathbb{R}$ and for any hyper-real infinitesimal $\Delta x \in ~^\star\!\mathbb{R}$,
Defined.
Notice that the existence condition is required, otherwise the $\mathrm{st}(-)$ notation is not even defined. Moreover, you actually have to talk about the extension $~^\star\!f$, since $f$ is a function of a real variable, so the notation $f(x + \Delta x)$ is undefined as well.
Moreover, in NSA, the standard part of a hyperreal $a \in ~^\star\! \mathbb{R}$ is simply the unique real number $a' \in \mathbb{R}$ so that $a - a'$ is infinitesimal, provided that such an $a'$ exists. In IST, we don't deal with hyperreals; however, we define the standard part of a real number $a \in \mathbb{R}$ as the unique standard real number $a' \in \mathbb{R}$ such that $a - a'$ is infinitesimal, provided that such an $a'$ exists. In both settings, we have to prove that as long as our numbers are not "too large", they have standard parts.
Theorem (NSA). Consider a hyperreal $a \in ~^\star\! \mathbb{R}$ so that $|a| \leq b$ for some real number $b \in \mathbb{R}$. Then $a$ has a standard part.
Theorem (IST). Consider a real number $a \in \mathbb{R}$ so that $|a| \leq b$ for some standard real number $b \in \mathbb{R}$. Then $a$ has a standard part, i.e. a unique standard real $a'$ so that $a - a'$ is infinitesimal.
This allows us to phrase S-differentiability in an identical way to H-differentiability:
Definition. Consider a standard real function $f: \mathbb{R} \rightarrow \mathbb{R}$, and a standard real number $a \in \mathbb{R}$. We say that $f$ is S-differentiable at the point $a$ if we can find a standard real number $m \in \mathbb{R}$ so that for any real infinitesimal $\Delta x \in \mathbb{R}$,
Defined.
Notice how the hyperreals of H-differentiability become ordinary reals in S-differentiability, while the reals and (non-hyper)real-valued functions of H-differentiability become standard reals and standard real-valued functions. The $\star$-extensions, which we suppressed in countless places in the NSA definitions (e.g. strictly speaking where I write $-$, I should be writing the $\star$-extension of real-valued subtraction to the hyperreals), disappear altogether.
In fact, from the outside view, a "standard function $f: \mathbb{R} \rightarrow \mathbb{R}$" in IST is the exact same thing as an "arbitrary function $f: \mathbb{R} \rightarrow \mathbb{R}$" in NSA. The behavior of an "arbitrary function $f: \mathbb{R} \rightarrow \mathbb{R}$" in IST is similar to (although not identical to) that of a "hyperreal function $f: ~^\star\!\mathbb{R} \rightarrow ~^\star\!\mathbb{R}$" in NSA. With this in mind, the relationships between H-differentiability, S-differentiability and classical differentiability should become clearer:
The IST definition of S-differentiability is equivalent to the classical definition of differentiability only if $f$ is a standard real-valued function and $a$ is a standard real number.
The NSA definition of H-differentiability is equivalent to the classical definition of differentiability only if $f$ is a real-valued function, and $a$ is a real number.
Moreover, neither of the definitions has more "operational" content than the other: NSA just seemed more operational due to cleverly chosen notation.
In IST, you can characterize classical differentiability for arbitrary real-valued functions, by using S-differentiability and the Standardization operation. In NSA, to characterize classical differentiability for real-valued functions, you would use H-differentiability: but that gets you only as far as the standard real-valued functions of IST, and it wouldn't help you talk about differentiability for hyperreal-valued functions at all.
Now, I can answer your question.
In the exact way that Alain Roberts's book suggests. We say that $f: \mathbb{R} \rightarrow \mathbb{R}$ is differentiable at $a$ if $(f,a)$ belongs to the set
Is this definition more "artificial" or "away from our intuition" than the definition using hyperreals?
It makes no sense to criticize this definition as any more "artificial" or "away from our intuition" than the definition using hyperreals. In fact, the hyperreal definition looks nigh-identical, and says that $f: \mathbb{R} \rightarrow \mathbb{R}$ is differentiable at $a$ if $(f,a)$ belongs to the set
Keep in mind that if you write $\delta \approx 0 \rightarrow \frac{f(a + \delta) - f(a)}{\delta} \approx m$ on a blackboard, most mathematicians who have never learned any form of nonstandard analysis will still know that you're talking about differentiability of $f$ at $a$. Hardly something one could call away from our intuition!
Won't I need to fall back to the classical definitions to prove stuff?
You never ever need to make use of (fall back to) or even state the classical definitions. Once you define S-continuity, you can simply define continuity using Standardization, and never give any equivalent, $\varepsilon-\delta$ classical characterization. The same goes for differentiability: you can explain what S-differentiability means, and then define differentiability as the standardized notion.
Say you want to prove something about the relationship between these two. E.g. you want to show that every differentiable function is continuous. You can argue as follows.
By Transfer, it suffices to prove that every standard differentiable function is continuous. But a standard function $f$ is continuous precisely if it satisfies S-continuity: remember, you defined the set of continuous functions by Standardization, as the unique standard set of functions whose standard elements are S-continuous. The same goes for differentiabiliy: a standard $f$ is differentiable precisely if $f$ satisfies S-differentiability. So it suffices to prove that if a standard $f$ is S-differentiable, then it is S-continuous.
Following this template, you will only ever use the S-definitions, and never have to make use of classical characterizations.
Isn't the set formed by Standardization "away from our intuition"?
You may regard Standardization as somehow more mysterious than the other two principles of IST. If so, keep in mind that the $\star$-extension of NSA is equally mysterious. Both of them require new intuitions. There are infinitely many ways to extend a function $f: \mathbb{R} \rightarrow \mathbb{R}$ to the hyperreals. Can you explain, succinctly, what makes the $\star$-extension special among these? In fact, Standardization and $\star$-extension are very closely related metamathematically, and it's hard to capture their meaning without explaining a construction of some particular model of IST/NSA. Personally, I think that IST Standardization fares better than its NSA alternative in this regard: most people who study IST eventually develop a good intuition for it, without having to ever go through a model construction.