Answer to Part 1
The fractional coefficients in RK4 sum to 1. This is in part to satisfy what is known as the order conditions of the integrator. Basically, we want RK4 to be a 4th order method. For this to be true, it has to satisfy a special relationship using the coefficients of its Butcher Tableau (see at the top here), namely
$$\mathbf{b}^T A^kC^{l-1}\mathbf{1} = \frac{(l-1)!}{(l+k)!}$$
where $A$ is the coefficients in the main part of the tableau, $\mathbf{b}$ is a vector of the $b$ coefficients (in RK4, this is 1/6, 1/3, 1/3, 1/6), and $C$ is a diagonal matrix formed by the $C$ coefficients.
We need to craft these coefficients to get $0$-stability. It is provable that if the method is $0$-stable, it will converge to the order of accuracy -- in this case, order 4, or $h^4$.
The reason that we carry $k_1$, $k_2$, etc. through to the final solution is because $k_4$ is not simply a sum of the previous $k$'s. Each $k$ can be thought of as a projection of a finite difference. The RK method essentially averages these projections in a non-standard way. If you do out the math formally, you find that with these weights, you get optimal accuracy for your number of function evaluations.
I figured out the answer to my question, with the help of the Peter in the comments of this question. I decided to post what I've found here in case it might help other people (because I can't seem to find a good explanation of this anywhere else online).
First of all, for those who do not know, the Runge-Kutta method is a method of solving first order differential equations numerically. Explicitly, our task is this:
Given a derivative $\frac{\mathrm{dy}}{\mathrm{dx}} = y'(x, y)$ and an
initial value for its antiderivative $y(x_n) = y_n$, approximate $y(x_{n+1}) = y_{n+1}$ where the
approximation gets better as $x_{n+1} - x_n$ goes to $0$.
I've found that looking at the more general problem first makes this difficult to understand, so instead we will look at the slightly simpler case where the derivative is only a function of $x$ and not of $y$. In other words, $\frac{\mathrm{dy}}{\mathrm{dx}} = y'(x)$
The most basic and straight forward way of doing this is called Euler's Method. Essentially it works by moving in the direction of the slope of $y$ at $x_n$. Symbolically:
$$
y_{n+1} \approx y_n + (x_{n+1} - x_n)y'(x_n)
$$
As you can see by the image, this is okay at approximating the next point, but it isn't great. We'd like to somehow find a better approximation.
So here is where I was getting confused. Remember how I said that Euler's method works by moving in the direction of the tangent line? Well, that is true, but it isn't very helpful, as it doesn't really allow us to make our approximation better. We are given the function that gives us the tangent line at any $x$ value exactly so there is no room for improvement other than making $x_{n+1} - x_n$ smaller. Instead, we should notice that, in our approximation, if we move things around a bit, we get the following
$$
y_{n+1} - y_n \approx (x_{n+1} - x_n)y'(x_n)
$$
Why is this important? Well $y_{n+1} - y_n$ is actually just the integral of $y'(x)$ from $x_n$ to $x_{n+1}$. So in reality, Euler's method is just a direct result of the Riemann Sum approximation of the integral!
$$
\int_{x_n}^{x_{n+1}} y'(x) \mathrm{dx} \approx (x_{n+1} - x_n)y'(x_n)
$$
From this, we can generalize Euler's method as direct result of the Fundamental Theorem of Calculus. In other words
$$
y_{n+1} = y_n + \int_{x_n}^{x_{n+1}} y'(x) \mathrm{dx}
$$
And better approximations of the integral lead to better approximations of $y_{n+1}$! This is the basis of the Runge-Kutta method.
How do we get better approximations of the integral? Well, a Riemann Sum uses the area under a constant that goes through one point of the function to approximate the integral.
But a better approximation would be the Trapezoidal method which uses the area under a straight line that goes through two points.
An even better approximation would be Simpson's method, which uses a parabola that goes through three points.
The pattern here is that we can use an $n_{th}$ degree polynomial using $n$ points to get better and better approximations of the integral as $n$ increases (the fact that these approximations get better is a direct result of Taylor's Theorem). So what makes this better than using a smaller step size in Euler's method? Well, because this is a result of Taylor's Theorem, with Euler's method, the approximation gets better linearly as we decrease the step size, while with a method of higher order (say using a polynomial of degree $n$) if we decrease the step size by $\delta$, then the approximation gets better by approximately $\delta^n$.
However, when we try to apply these methods to the more general case of $\frac{\mathrm{dy}}{\mathrm{dx}} = y'(x, y)$ we run into a slight problem. We are only given one value of $y$ (namely $y(x_0)$) but we need $n$ points on $y'$ for an $n_{th}$ order Runge-Kutta. So what do we do?
Unfortunately, we have to approximate other values of $y$ with lower order methods before we can use the higher order methods. We lose a bit of accuracy here, but as it turns out, this is generally not too big of a deal, because the approximation still gets better with $\delta^n$.
So lets say $R_n(y', y_{n-1})$ is the function that gets the next value, $y_{n}$, using the $n_{th}$ order Runge-Kutta method. Then, the following conditions hold:
$$R_n(y', y_{n-1}) \approx R_n(y', R_{n-1}(y', y_{n-2}))$$
$$R_1(y', y_0) = (x_1 - x_0) y'(x_0, y_0)$$
And this is gives rise to the "K" values mentioned in the question.
Best Answer
Assuming you're using method of lines.
Let the original initial-boundary problem be $$ u_t = u_{xx}\\ u(0, x) = f(x)\\ u(t, 0) = a(t)\\ u(t, 1) = b(t). $$ Introduce a set of points $x_j = jh,\; j = 0,1,\dots, N,\;Nh = 1$. Let $u_j(t) = u(t, x_j)$. Note that $u_0(t) = a(t),\; u_N(t) = b(t)$ are already known. Unknown are $u_j(t),\; j = 1, 2, \dots, N-1$. Then $u_{xx}(t, x_j)$ can be approximated as $$ u_{xx}(t, x_j) \approx \frac{u_{j-1}(t) - 2u_j(t) + u_{j+1}(t)}{h^2}. $$ Plugging that into PDE gives $$ u'_j(t) = \frac{u_{j-1}(t) - 2u_j(t) + u_{j+1}(t)}{h^2} $$ a system of $N-1$ ODEs with initial conditions $u_j(0) = f(x_j)$. That could be solved using any RK method (provided that method is stable). For explicit Euler method that would be $$ \frac{u_j(t_{n+1}) - u_j(t_n)}{\tau} = \frac{u_{j-1}(t_n) - 2u_j(t_n) + u_{j+1}(t_n)}{h^2}, \; j = 1, 2, \dots, N-1\\ u_j(0) = f(t_j)\\ u_0(t) = a(t), \; u_N(t) = b(t). $$ This well known explicit scheme is stable when $\frac{\tau}{h^2} \leqslant \frac{1}{2}$.
Edit. For those who ask how to use this method with higher order RK methods. Let's take for example an RK2 method (explicit midpoint) with the following Butcher's tableau: $$ \begin{array}{c|cc} 0 & 0 & 0\\ 1/2 & 1/2 & 0\\ \hline & 0 & 1 \end{array} $$ Applied to ODE in form $\mathbf u' = \mathbf F(t, \mathbf u)$ this method expands to $$ \mathbf r = \mathbf F(t_n, \mathbf u_n)\\ \mathbf s = \mathbf F\left(t_n + \frac{\tau}{2}, \mathbf u_n + \frac{\tau}{2} \mathbf r\right)\\ \frac{\mathbf u_{n+1} - \mathbf u_n}{\tau} = \mathbf s $$ I've used $\mathbf r$ and $\mathbf s$ instead of common $\mathbf k_{1,2}$ to reduce the number of indices involved. Here $\mathbf r$ and $\mathbf s$ are intermediate values that depend solely on values of $u_j$ at $t = t_n$.
For our case the right hand side of the ODE is $$ F_j(t, u_1, u_2, \dots, u_{N-1}) = \frac{1}{h^2}\begin{cases} u_0(t) - 2 u_1 + u_2, &j = 1\\ u_{j-1} - 2 u_j + u_{j+1}, &1 < j < N-1\\ u_{N-2} - 2 u_{N-1} + u_N(t), &j = N-1 \end{cases} $$ Note that $u_0(t)$ and $u_N(t)$ are given explicitly as $a(t)$ and $b(t)$. This is why I have separated cases $j=1$ and $j = N-1$ in the definition of $F_j$.
Putting this altogether gives $$ r_j = \frac{u_{j-1}(t_n) - 2 u_j(t_n) + u_{j-1}(t_n)}{h^2}\\ s_j = \frac{1}{h^2}\begin{cases} u_0\left(t + \frac{\tau}{2}\right) - 2 \left(u_{1}(t_n) + \frac{\tau}{2}r_{1}\right) + \left(u_2(t_n) + \frac{\tau}{2}r_2\right), &j = 1\\ \left(u_{j-1}(t_n) + \frac{\tau}{2}r_{j-1}\right) - 2 \left(u_j(t_n) + \frac{\tau}{2}r_j\right) + \left(u_{j+1}(t_n) + \frac{\tau}{2}r_{j+1}\right), &1 < j < N-1\\ \left(u_{N-2}(t_n) + \frac{\tau}{2}r_{N-2}\right) - 2 \left(u_{N-1}(t_n) + \frac{\tau}{2}r_{N-1}\right) + u_N\left(t + \frac{\tau}{2}\right), &j = N-1 \end{cases}\\ \frac{u_j(t_{n+1}) - u_j(t_n)}{\tau} = s_j $$ Note that $r_j$ and $s_j$ are helper values to step from $u_j(t_n)$ to $u_j(t_{n+1})$ and are different for each time step. One may want to attribute values $r_j$ and $s_j$ to some moment of time. A reasonable choice would be attributing all each value $\mathbf k_i$ with moment $t_n + \tau c_i$. Here $c_i$ is the first column of the Butcher's tableau.
$$ r_j(t_n) = \frac{u_{j-1}(t_n) - 2 u_j(t_n) + u_{j-1}(t_n)}{h^2}\\ s_j\left(t_n + \frac{\tau}{2}\right) = \frac{1}{h^2} \times \\ \times \begin{cases} u_0\left(t + \frac{\tau}{2}\right) - 2 \left(u_{1}(t_n) + \frac{\tau}{2}r_{1}(t_n)\right) + \left(u_2(t_n) + \frac{\tau}{2}r_2(t_n)\right), &j = 1\\ \left(u_{j-1}(t_n) + \frac{\tau}{2}r_{j-1}(t_n)\right) - 2 \left(u_j(t_n) + \frac{\tau}{2}r_j(t_n)\right) + \left(u_{j+1}(t_n) + \frac{\tau}{2}r_{j+1}(t_n)\right), &1 < j < N-1\\ \left(u_{N-2}(t_n) + \frac{\tau}{2}r_{N-2}(t_n)\right) - 2 \left(u_{N-1}(t_n) + \frac{\tau}{2}r_{N-1}(t_n)\right) + u_N\left(t + \frac{\tau}{2}\right), &j = N-1 \end{cases}\\ \frac{u_j(t_{n+1}) - u_j(t_n)}{\tau} = s_j\left(t_n + \frac{\tau}{2}\right) $$
While this is the answer to the question "How to apply RK2 to this ODE" I really don't like the final form. Instead let's write the same method in a slightly different form: $$ \frac{\mathbf u_{n+1/2} - \mathbf u_n}{\tau / 2} = \mathbf F(t_n, \mathbf u_n)\\ \frac{\mathbf u_{n+1} - \mathbf u_n}{\tau} = \mathbf F(t_n + \frac{\tau}{2}, \mathbf u_{n+1/2}). $$ One can check that the methods are the same by plugging $\mathbf u_{n+1/2} = \mathbf u_n + \frac{\tau}{2} \mathbf r$. Just like $\mathbf r$ and $\mathbf s$ the $\mathbf u_{n+1/2}$ is a helper value used to perform a step from $\mathbf u_n$ to $\mathbf u_{n+1}$.
Applied to our ODE this method gives $$ \frac{u_j(t_{n+1/2}) - u_j(t_n)}{\tau / 2} = \frac{u_{j-1}(t_n) - 2 u_j(t_n) + u_{j-1}(t_n)}{h^2}, \quad j = 1, 2, \dots, N-1\\ u_0(t_{n+1/2}) = a\left(t_n + \frac{\tau}{2}\right), \quad u_N(t_{n+1/2}) = b\left(t_n + \frac{\tau}{2}\right),\\ \frac{u_j(t_{n+1}) - u_j(t_n)}{\tau / 2} = \frac{u_{j-1}(t_{n+1/2}) - 2 u_j(t_{n+1/2}) + u_{j-1}(t_{n+1/2})}{h^2}, \quad j = 1, 2, \dots, N-1\\ u_0(t_{n+1}) = a\left(t_n + \tau\right), \quad u_N(t_{n+1}) = b\left(t_n + \tau\right). $$ Though not strictly necessary I have defined values $u_0(t_{n+1/2})$ and $u_N(t_{n+1/2})$ to get rid of treating $j=1$ and $j=N-1$ as separate cases.