Typically a cubic spline is given for a set of points such that for each interval a cubic function is fitted to the points, with matching slopes and curvature, as well as the endpoints, have zero curvature (2nd derivative). This is the definition of a natural spline.
In your case, you have 4 points and 3 intervals. The number of unknowns is 12, as for each interval there are 4 unknowns (the cubic part).
Your constraints are:
- Points on given nodes (4 equations)
- Interval end points match (2 equations)
- Interval end slopes match (2 equations)
- Interval end curvature match (2 equations)
- End point cuvature zero (2 equations)
In total, you have 12 equations with 12 unknowns.
See http://www.physics.utah.edu/~detar/phys6720/handouts/cubic_spline/cubic_spline/node1.html for more details. Also wikipedia has a nice write-up.
For your case the solution is
$$ y(x)=\begin{cases}
\frac{2x(7-2x^{2})}{5} & x=0\ldots1\\
\frac{5x^{3}-27x^{2}+41x-9}{5} & x=1\ldots2\\
\frac{-x^{3}+9x^{2}-31x+39}{5} & x=2\ldots3
\end{cases} $$
Details
The general equation for a cubic spline is an interpolation using the values $Y_i$ and the 2nd derivatives $Y''_i$ at each node. Once the 2nd derivatives are known, then the spline is solved for and the following equation can be used for interpolation between nodes $i$ and $i+1$.
This is done with a parameter $t=0\ldots 1$ that spans the interval
$$\begin{aligned}
x_i & = X_i + t\,h \\
y_i & = (1-t) Y_i + y\,Y_{i+1} + \tfrac{h^2}{6} \left( (1-t) (t^2-2 t) Y''_i + t (t^2-1) Y''_{i+1} \right) \end{aligned} $$
The system of equations to find a spline is formed using the following procedure:
Start Point Either the spline in a natural spline with $Y''_1 = 0$ or it has a specified slope $Y'_1$ at which point the following equation must be used
$$ Y''_2 + 2 Y''_1 = \tfrac{6}{h^2} \left( Y_2 - Y_1 - h Y'_1 \right) \tag{i = 1}$$
Internal Points Here the slope needs to match at nodes and this is ensured with the following equation for $i=2 \ldots n-1$
$$ Y''_{i-1} + Y''_{i+1} + 4 Y''_i = \tfrac{6}{h^2} \left( Y_{i-1} + Y_{i+1} - 2 Y_i \right) \tag{i = 2 to n-1}$$
End Point Again the spline is either natural with $Y''_n = 0$ or with the specified slope $Y'_n$, and the following equation must be used.
$$ Y''_{n-1} +2 Y''_n = \tfrac{6}{h^2} \left( Y_{n-1} - Y_n + h Y'_n \right) \tag{i = n}$$
For your example, I assume you want a natural spline so only the interior points are considered, resulting in these two equations
$$\begin{aligned}
4 Y''_2 + Y''_3 & = \tfrac{6}{h^2} \left( Y_3 - 2 Y_2 + Y_1 \right) \\
Y''_2 + 4 Y''_3 & = \tfrac{6}{h^2} \left( Y_4 - 2 Y_3 + Y_1 \right)
\end{aligned}$$
Using $h=1$ and $Y=[0,2,1,0]$ and solved for $Y''_2$ and $Y''_3$.
NOTE: I have assumed constant interval with $h = X_{i}-X_{i-1}$, but that above can be adapted to a varying grid also.
Cubic spline interpolation thru points will pass thru the given points exactly (subject to numeric errors though). But in general it will have deviation when compared against the original curve (from which you sample the points) except at those interpolated points. The fact that you are having a 0.1% deviation suggests that the cubic spline you found is a very good approximation to the original curve.
In fact, if you use the interpolated points (instead of the test points) in step 3, you should still find "loss of precision" while the difference should be theoretically zero. Such "loss of precision" comes from doing the numeric solving of the linear equation set. Such numeric errors could become notably large when your matrix value is extremely big (or small), such as in your case.
Best Answer
You're correct that a linear spline is just a sequence of straight lines. So, in between the data points $P_{i-1} = (x_{i-1}, y_{i-1})$ and $P_{i} = (x_{i}, y_{i})$, the equation of the spline is just $$ y = \frac{x_i - x}{x_i - x_{i-1}}y_{i-1} + \frac{x - x_{i-1}}{x_i - x_{i-1}}y_{i} $$ You can easily confirm that $y(x_{i-1}) = y_{i-1}$ and $y(x_{i}) = y_{i}$, so the linear pieces join properly, with no discontinuity.
Cubic splines are covered in many places. The basic idea is to assume that each segment of the curve (between two successive data points) is a cubic polynomial, whose coefficients are to be determined. We want these cubic pieces to join smoothly; specifically, where they meet, we want their first and second derivative values to match. You write down equations that express this derivative matching. You get a system of linear equations in which the unknowns are the coefficients of the polynomial pieces. The system of equations is nicely "banded", and therefore easy to solve.
For further details, see here, or here, or this answer.
Computing cubic splines is much easier if you express each segment in Hermite form, rather than algebraic form. This ensures from the outset that values and first derivatives match, and you only have to solve a linear system that forces second derivatives to match, too. The size of your system of equations is much smaller -- you have roughly $N$ unknowns, instead of $4N$.