[Math] the rigorous definition of $dy$ and $dx$

ordinary differential equationsreal-analysisreference-request

Some background: I am a third year undergrad. I have completed two courses on Real Analysis (I have studied $\epsilon$-$\delta$ definitions of limit, continuity, differentiability, Reimann integration and basic topology of $\mathbb{R}$). This semester I have enrolled in our first differential equation course.

This is not the first time I am studying DE. In high school, we studied first-order ODE: separation of variables, homogenous DEs, linear first-order DEs and Bernoulli's differential equation. There was a common theme in solution of most of these DEs, the author would somehow manipulate the DE to arrive at an equation of the following form: $f(x)dx = g(y)dy$, then, they would integrate both sides to obtain the solution. I never understood what the equation $f(x)dx = g(y)dy$ meant, since, the only place where we were introduced to the symbols $dx$ and $dy$ was the definition of derivative: $\dfrac{dy}{dx}$ and integraition $\int f(x)dx$. Note that in these definitions, neither $dx$ nor $dy$ occurs on its own separately, they either occur with each other (in the case of derivative) or with $\int f$ (in the case of integral). As a high school student, I was never satisfied with this way of solving DEs but I soon convinced myself that this was correct because it seemed impossible to solve certain DEs without abusing this notation. And, very often, the DEs would already be in the form containing $dx$ and $dy$ separately.

Now, in the current course, I was expecting that we would be taught a mathematically rigorous way of solving DEs, but I was disappointed after reading first few sections in the recommended textbooks. The books are Simmons' Differential Equations with Applications and Historical Notes and Ross' Differential Equations.

In section 7, Simmons writes:

The simplest of the standard types is that in which the variables are separable:
$$\dfrac{dy}{dx} = g(x)h(y)$$As we know, to solve this we have only to write it in the separated form $dy/h(y) = g(x)dx$ and integrate …

Similarly, Ross also uses $dy$ and $dx$ freely without defining what they mean. In section 2.1, Ross writes:

The first-order differential equations to be studied in this chapter may be expressed in either the derivative form $$\dfrac{dy}{dx} = f(x, y)$$ or the differential form $$M(x, y)dx + N(x, y)dy = 0$$

Thus, I would like to ask the following questions:

  1. What is the rigorous definition of $dx$ and $dy$?

  2. How do we conclude from $\dfrac{dy}{dx} = k$ that $dy = k dx ?$

  3. Since, we integrate both sides after arriving at the form $f(x)dx = g(y)dy$, does it mean that $f(x)dx$ and $g(y)dy$ are integrable (possibly, Reimann) functions? If so, how do we show that?

  4. What does Ross mean by differential form? Wikipedia gives me something related to differential geometry and multivariable calculus, what has that got to do with ODEs?

  5. Why do all authors don't care to define $dx$? I have seen undergrad texts on number theory which begin by defining divisibility, grad texts on Fourier analysis which define what Reimann integration means. And yet, we have introductory texts on ODEs which do not have enough content on something which to me appears to be one of the most common method to solve first-order ODEs. Why is this so? Do authors assume that the reader has already studied that in an earlier course? Should I have studied this in the Real Analysis course?

  6. Finally, is there any textbook which takes an axiomatic approach to solving DEs? I understand that some authors want to spend more time discussing motivation behind DEs: their origin in the physics, their applications in economics etc. But I believe I already have enough motivation to study DEs: in the first year I had to take classical mechanics, in which I studied many types DEs, including second-order DEs for harmonic oscillator and I also had to take quantum chemistry in which I studied PDEs like the classical wave equation and Schrodinger equation. In Terry Tao's language, I believe I am past the pre-rigorous phase of DEs, and what I expect from an undergrad course is an axiomatic treatment of the subject. What is a good textbook which serves this purpose?

EDIT: I did go through a similar question. But I am not looking for a way to completely do away with the so called differentials $dx$ and $dy$, because, that would make solving certain DEs very difficult. Rather, I am looking for a rigorous theory, which formalizes operations like taking $dx$ to the other side and integrating both sides.

Best Answer

I'm going to get an excursus that is much more complicated than you actually need for your case, where basically the dimension is $1$. However, I think that you need the following to better understand what is going on behind the "mumbo jumbo" formalism of $\operatorname{d}x, \operatorname{d}y$ and so on.

Get a linear space $V$ of dimension $n\in\mathbb{N}$ and a base $\{e_1,...,e_n\}$. You know from the linear algebra course that there exists a unique (dual) base $\{\varphi_1,...,\varphi_n\}$ of the dual space $V'$ such that: $$\forall i,j\in\{1,...,n\}, \varphi_i(e_j)=\delta_{i,j}.$$ Get back in $\mathbb{R}^n$ and let $\{e_1,...,e_n\}$ be its standard base. Then you define $\{\operatorname{d}x_1,...,\operatorname{d}x_n\}$ as the dual base of $\{e_1,...,e_n\}$.

Then you need the concept of the differential of a function: if $\Omega$ is an open subset of $\mathbb{R}^n$ and $f :\Omega\rightarrow\mathbb{R}$ and $x\in\Omega$, you will say that $f$ is differentiable in $x$ if there exists a linear map $L:\mathbb{R}^n\rightarrow \mathbb{R}$ such that $$f(y)=f(x)+L(y-x)+o(\|y-x\|_2), $$ for $y\rightarrow x$, where $\|\|_2$ is the Euclidean norm in $\mathbb{R}^n$. Also, you will say that $f$ is differentiable if it is differentiable in $x$ for each $x\in\Omega$.

You can prove that if $f$ is differentiable, then for each $x\in\Omega$ the linear map $L$ is unique, in the sense that if $M$ is another linear map that do the same job, then $M=L$. So you are in position to define the differential of $f$ in $x$ as the linear map $L$. In general, when you change the point $x$, also the differential of $f$ in $x$ changes, so you define a map: $$\operatorname{d}f: \Omega\rightarrow (\mathbb{R}^n)'$$ that at each $x\in\Omega$ associates the differential of $f$ in $x$. This map is called the differential of $f$.

Now, fix a differentiable $f :\Omega \rightarrow \mathbb{R}$. Then $\forall x\in\Omega, \operatorname{d}f(x)\in (\mathbb{R}^n)'$ and so, being $\{\operatorname{d}x_1,...,\operatorname{d}x_n\}$ a base for $(\mathbb{R}^n)'$, there exist $a_1:\Omega\rightarrow\mathbb{R},..., a_n:\Omega\rightarrow\mathbb{R}$ such that: $$\forall x \in \Omega, \operatorname{d}f(x)=a_1(x)\operatorname{d}x_1+...+a_n(x)\operatorname{d}x_n.$$ You can prove that $$\frac{\partial{f}}{\partial{x_1}}=a_1,...,\frac{\partial{f}}{\partial{x_n}}=a_n$$ where $\frac{\partial{f}}{\partial{x_1}},...,\frac{\partial{f}}{\partial{x_n}}$ are the partial derivatives of $f$. So, you have: $$\forall x \in \Omega, \operatorname{d}f(x)= \frac{\partial{f}}{\partial{x_1}}(x)\operatorname{d}x_1+...+\frac{\partial{f}}{\partial{x_n}}(x)\operatorname{d}x_n.$$

Now, you define a differential form to be any function: $$F :\Omega \rightarrow (\mathbb{R}^n)'$$ so, in particular, the differential of a differentiable map is a differential form.

You will learn during the course that you can integrate continuous differential form along $C^1$ curves. Precisely, if $\gamma :[a,b] \rightarrow \Omega$ is a $C^1$ function and $F :\Omega \rightarrow(\mathbb{R}^n)'$ is a differential form, then you define: $$\int_\gamma F := \int_a ^ b F(\gamma(t))(\gamma'(t))\operatorname{d}t,$$ where the right hand side is a Riemann integral (remember that $F(\gamma(t))\in(\mathbb{R}^n)'$ and that $\gamma'(t)\in\mathbb{R}^n$, so $F(\gamma(t))(\gamma'(t))\in\mathbb{R}$).

Now, it can be proved that if $f$ is a differentiable function whose differential is continuous, then: $$\int_\gamma\operatorname{d}f = f(\gamma(b))-f(\gamma(a)).$$

Finally, we come back to earth. In your case, you have that $n=1$. So let's interpret the equation $$\frac{\operatorname{d}y}{\operatorname{d}x} = f(x,y)$$ in the context of differential formalism developed above:

  1. $\{\operatorname{d}x\}$ is the dual base in $(\mathbb{R})'$ of the base $\{1\}$ in $\mathbb{R}$;
  2. $y$ is a function, say from an open interval $I\subset\mathbb{R}$, i.e. $y:I\rightarrow\mathbb{R}$;
  3. $\operatorname{d}y$ is the differential of the function $y$, and then $\operatorname{d}y : I \rightarrow (\mathbb{R})'$;
  4. Then, as we stated before (see the section about partial derivatives), it holds that the derivative of $y$, i.e. $y'$, satisfies $\forall x\in I, \operatorname{d}y(x) = y'(x)\operatorname{d}x$. Here, the expression $\frac{\operatorname{d}y}{\operatorname{d}x}$ is just a name for $y'$, so, keeping that in mind, $\forall x\in I, \operatorname{d}y(x) = \frac{\operatorname{d}y}{\operatorname{d}x}(x)\operatorname{d}x$;
  5. $f : I\times \mathbb{R}\rightarrow \mathbb{R}$ is a function, and we want that $\forall x \in I, \frac{\operatorname{d}y}{\operatorname{d}x}(x) \doteq y'(x) = f(x,y(x))$;
  6. So you want that $\forall x \in I, \operatorname{d}y(x) \overset{(4)}{=} \frac{\operatorname{d}y}{\operatorname{d}x}(x)\operatorname{d}x \overset{(5)}{=} f(x,y(x)) \operatorname{d}x$ (notice that this is an equation in $(\mathbb{R})'$);
  7. Now, get an interval $[a,b]\subset I$ and integrate the differential form along the curve $\gamma :[a,b]\rightarrow I, t\mapsto t$. On one hand you get: $$\int_\gamma \operatorname{d}y = \int_a ^b \operatorname{d}y(\gamma(t))(\gamma'(t))\operatorname{d}t = \int_a ^b y'(t)\operatorname{d}t = y(b)-y(a),$$ and on the other hand: $$\int_\gamma \operatorname{d}y = \int_\gamma (x\mapsto f(x,y(x)))\operatorname{d}x = \int_a ^b f\left(\gamma(t),y(\gamma(t))\right)\operatorname{d}x(\gamma' (t))\operatorname{d}t = \int_a ^b f(t,y(t))\operatorname{d}t,$$ and so: $$y(b)-y(a) = \int_a ^b f(t,y(t))\operatorname{d}t.$$