The way I see it, when you differentiate an inverse trigonometric function, you don't get another inverse trigonometric function. Instead you get "simpler" functions like $1/(1 + x^2)$ or $1/\sqrt{1-x^2}$. This does not typically happen with the antiderivative of such functions.
Similarly, when you differentiate a logarithmic function, the logarithm disappears.
So, when using integration by parts $\int u dv = uv - \int v du$, it makes sense to select the inverse trigonometric or logarithmic function to be the one that is the $u$ term.
In the case of algebraic, trigonometric and exponential functions both integration and differentiation don't change the nature of the function, so they come later in the ILATE order.
Of course, this is just intuition and there are examples where you can violate this so called rule and still integrate by parts without any problems.
Here's what's really going on with the dual problem. (This is my attempt to answer my own question, over a year after originally asking it.)
(A very nice presentation of this material is given in Ekeland and Temam. These ideas are also in Rockafellar.)
Let $V$ be a finite dimensional normed vector space over $\mathbb R$. (Working in an inner product space or just in $\mathbb R^N$ risks concealing the fundamental role that the dual space plays in duality in convex optimization.)
The basic idea behind duality in convex analysis is to think of a convex set in terms of its supporting hyperplanes. (A closed
convex set $\Omega$ can be "recovered" from its supporting hyperplanes
by taking the intersection of all closed half spaces containing $\Omega$.
The set of all supporting hyperplanes to $\Omega$ is sort of a
"dual representation" of $\Omega$.)
For a convex function $f$ (whose epigraph is a convex set), this strategy leads
us to think about $f$ in terms of affine functions $\langle m^*, x \rangle - \alpha$
which are majorized by $f$. (Here $m^* \in V^*$ and we are using the notation $\langle m^*, x \rangle = m^*(x)$.)
For a given slope $m^* \in V^*$, we only need to consider the "best" choice of $\alpha$ -- the other affine minorants with slope $m^*$ can be disregarded.
\begin{align*}
& f(x) \geq \langle m^*,x \rangle - \alpha \quad \forall x \in V \\
\iff & \alpha \geq \langle m^*, x \rangle - f(x) \quad \forall x \in V \\
\iff & \alpha \geq \sup_{x \in V} \quad \langle m^*, x \rangle - f(x)
\end{align*}
so the best choice of $\alpha$ is
\begin{equation}
f^*(m^*) = \sup_{x \in V} \quad \langle m^*, x \rangle - f(x).
\end{equation}
If this supremum is finite, then
$\langle m^*,x \rangle - f^*(m^*)$ is the best affine minorant of
$f$ with slope $m^*$.
If $f^*(m^*) = \infty$, then there is no affine minorant of $f$ with slope $m^*$.
The function $f^*$ is called the "conjugate" of $f$. The definition and basic facts about $f^*$ are all highly intuitive. For example, if $f$ is a proper closed convex function then $f$ can be recovered from $f^*$, because any closed convex set (in this case the epigraph of $f$) is the intersection of all the closed half spaces containing it. (I still think the fact that the "inversion formula" $f = f^{**}$ is so simple is a surprising and mathematically beautiful fact, but not hard to derive or prove with this intuition.)
Because $f^*$ is defined on the dual space, we see already the fundamental role played by the dual space in duality in convex optimization.
Given an optimization problem, we don't obtain a dual problem until we specify how to perturb the optimization problem. This is why equivalent formulations of an optimization problem can lead to different dual problems. By reformulating it we have in fact specified a different way to perturb it.
As is typical in math, the ideas become clear when we work at an appropriate level of generality. Assume that our optimization problem is
\begin{equation*}
\operatorname*{minimize}_{x} \quad \phi(x,0).
\end{equation*}
Here $\phi:X \times Y \to \bar{\mathbb R}$ is convex. Standard convex optimization problems can be written in this form with an appropriate choice of $\phi$. The perturbed problems are
\begin{equation*}
\operatorname*{minimize}_{x} \quad \phi(x,y)
\end{equation*}
for nonzero values of $y \in Y$.
Let $h(y) = \inf_x \phi(x,y)$. Our optimization problem is simply to evaluate $h(0)$.
From our knowledge of conjugate functions, we know that
\begin{equation*}
h(0) \geq h^{**}(0)
\end{equation*}
and that typically we have equality. For example, if $h$ is subdifferentiable at $0$ (which is typical for a convex function) then $h(0) = h^{**}(0)$.
The dual problem is simply to evaluate $h^{**}(0)$.
In other words, the dual problem is:
\begin{equation*}
\operatorname*{maximize}_{y^* \in Y^*} \quad - h^*(y^*).
\end{equation*}
We see again the fundamental role that the dual space plays here.
It is enlightening to express the dual problem in terms of $\phi$. It's easy to show that the dual problem is
\begin{equation*}
\operatorname*{maximize}_{y^* \in Y^*} \quad - \phi^*(0,y^*).
\end{equation*}
So the primal problem is
\begin{equation*}
\operatorname*{minimize}_{x \in X} \quad \phi(x,0)
\end{equation*}
and the dual problem (slightly restated) is
\begin{equation*}
\operatorname*{minimize}_{y^* \in Y^*} \quad \phi^*(0,y^*).
\end{equation*}
The similarity between these two problems is mathematically beautiful, and we can see that if we perturb the dual problem in the obvious way, then the dual of the dual problem will be the primal problem (assuming $\phi = \phi^{**}$). The natural isomorphism between $V$ and $V^{**}$ is of fundamental importance here.
The key facts about the dual problem -- strong duality, the optimality conditions, and the sensitivity interpretation of the optimal dual variables -- all become intuitively clear and even "obvious" from this viewpoint.
An optimization problem in the form
\begin{align*}
\operatorname*{minimize}_x & \quad f(x) \\
\text{subject to} & \quad g(x) \leq 0,
\end{align*}
can be perturbed as follows:
\begin{align*}
\operatorname*{minimize}_x & \quad f(x) \\
\text{subject to} & \quad g(x) + y \leq 0.
\end{align*}
This perturbed problem has the form given above with
\begin{equation*}
\phi(x,y) =
\begin{cases}
f(x) \quad \text{if } g(x) + y \leq 0 \\
\infty \quad \text{otherwise}.
\end{cases}
\end{equation*}
To find the dual problem, we need to evaluate $-\phi^*(0,y^*)$, which is a relatively straightforward calculation.
\begin{align*}
-\phi^*(0,y^*) &= -\sup_{g(x) + y \leq 0} \quad \langle y^*,y \rangle - f(x) \\
&= -\sup_{\substack{ x \\ q \geq 0 }} \quad \langle y^*, -g(x) - q \rangle - f(x) \\
&= \inf_{\substack{ x \\ q \geq 0 }} \quad f(x) + \langle y^*, g(x) \rangle + \langle y^*, q \rangle.
\end{align*}
We can minimize first with respect to $q$, and we will get $-\infty$ unless $\langle y^*, q \rangle \geq 0$ for all $q \geq 0$. In other words, we will get $-\infty$ unless $y^* \geq 0$.
The dual function is
\begin{equation*}
-\phi^*(0,y^*) =
\begin{cases}
\inf_x \quad f(x) + \langle y^*, g(x) \rangle \quad \text{if } y^* \geq 0 \\
-\infty \quad \text{otherwise}.
\end{cases}
\end{equation*}
This is the expected result.
Best Answer
The intuition is that although both numerator and denominator tend to zero or infinity, what eventually matters is their respective rate of change. They do not approach zero or infinity at the same rate and thus the one with the highest rate of change dominates the other.