My favorite books for these three topics are
- Hairer Norsett Wanner, Solving Ordinary Differential Equations I -
NonstiffProblems
- Salsa, Partial Differential Equations in Action
- Quarteroni, Sacco, Saleri, Numerical Mathematics
In particular, the first reference is a classic in numerical integration of ODEs but contains a detailed introductory chapter on the basic theoretical aspects of ODEs. The second is a very well-written introductory book on PDE theory. While the first chapters provide with practical methods for solving one-dimensional PDEs in simple cases (separation of variables etc), the author did not forget some interesting theoretical aspects about PDEs. The third is a classic introduction on Numerical Analysis, and treats, among other topics, linear systems (direct and iterative methods), interpolation, numerical integration and basics on the integration of ODEs. All books contain several examples and interesting exercises.
I'm not aware of the existence of a single book treating all of the above topics.
PS: I'm not payed by Springer for advertising their books :)
(added 3/9/2021) There is no difference between the approaches via Taylor expansion or Butcher trees, the latter is just a systematic way to do the former. The Taylor coefficients will always be polynomial expressions in the method parameters/coefficients. With the manipulation of B-series one gets these expressions and the resulting order relations in a fully algebraic way.
At the end of the 19th century the idea was developed to use several close-by evaluations of the ODE function to get a more correct approximation of the next point. The idea of Kutta over the work of Runge and Heun*1 was to disregard any structure and use all of the previously computed slopes in the computation of the next one, using a general linear combination with undetermined coefficients, until a sufficient number for the integration step was obtained. In formulas, $\newcommand{\D}{\mathit{\Delta}}$
\begin{align}
\D_1y&=f(x,y)\D x\\
\D_2y&=f(x+k_2\D x,\,y+q_{21}\D_1y)\D x\\
\D_3y&=f(x+k_3\D x,\,y+q_{31}\D_1y+q_{32}\D_2y)\D x\\
&\vdots\\
\D_sy&=f(x+k_s\D x,\,y+q_{s1}\D_1y+...+q_{s,s-1}\D_{s-1}y)\D x\\
y_{\rm next}&=y+a_1\D_1y+...+a_s\D_sy
\end{align}
Then the order conditions were derived via Taylor expansion and solved for $s=2,3,4,5,6$ and examples given.
*1 What's the motivation for Runge-Kutta methods?
(original) Before the first step observe that the resulting system of order equations is under-determined, so you have some freedom in constructing a method.
The first step in the approach by Kutta and also usually followed today is to single out the order equations that do not contain the matrix coefficients, that is, that only contain the coefficients relevant to an integration of $y'(x)=f(x)$, a simple quadrature. Then it is well-known how to use interpolation to get quadrature rules for any selection of sample points. So select one of the named rules or make up your own. Kutta selected the Simpson rule and the 3/8 rule to construct examples, demanding further symmetry to further reduce degrees of freedom. Thus the sample points are selected as $(k_1,k_2,k_3,k_4)=(0,1/2,1/2,1)$ or $=(0,1/3,2/3,1)$ with coefficients $(a_1,a_2,a_3,a_4)=(1/6,1/3,1/3,1/6)$ or $=(1/8, 3/8,3/8,1/8)$. Kutta also explored other than the symmetric generalization of the Simpson rule.*2
*2 Why use Classic fourth-order Runge-Kutta over the 3/8-rule?
This then drastically reduces the variability in the remaining order conditions.
The classical RK4 method has the additional design decision that the matrix only is non-zero on the sub-diagonal, thus the non-zero entries are $q_{21}=k_2$, $q_{32}=k_3$, $q_{43}=k_4$. The quadrature conditions are
\begin{align}
a_1+a_2+a_3+a_4&=1\\
a_2k_2+a_3k_3+a_4k_4&=\frac12\\
a_2k_2^2+a_3k_3^2+a_4k_4^2&=\frac13\\
a_2k_2^3+a_3k_3^3+a_4k_4^3&=\frac14\\
\end{align}
and the order conditions for linear ODE reduce to
\begin{align}
a_3k_3k_2+a_4k_4k_3&=\frac16\\
a_4k_4k_3k_2&=\frac1{24}
\end{align}
and for nonlinear ODE
\begin{align}
a_3k_3k_2^2+a_4k_4k_3^2&=\frac1{12}\\
a_3k_3^2k_2+a_4k_4^2k_3&=\frac18\\
\end{align}
Leaving out the first equation which determines $a_1$, the other 7 equations connect the 6 quantities $k_2,k_3,k_4$ and $a_2k_2,a_3k_3,a_4k_4$. The last occur only linearly, so we have a linear system of 7 equations for 3 quantities. That the rank of the extended system matrix is less than 4 then gives 3 determinant equations for the 3 variables $k_2,k_3,k_4$, which completely determines them to one of finitely many solutions.
Best Answer
All Runge-Kutta methods, all multi-step methods can be easily extended to vector-valued problems, that is systems of ODE. Some of the order conditions for Runge-Kutta systems collapse for scalar equations, which means that the order for vector ODE may be smaller than for scalar ODE. Methods with that defect are usually not considered, you can find one famous instance, an order 5 method, in the original paper of Kutta.