[Math] Equations for Cubic Regression

cubicsregression

So, I'm making a simple program for drawing graphs, and I'm looking at making some simple best-fit curves using some basic regression analysis. I've happily got linear and quadratic regression working (thanks to this post), but it's not quite detailed enough. I'm aware that cubic curves can be extremely good at this, within reason (and hence why certain spline methods are constructed with them), so I've attempted to expand this into a cubic form, but it doesn't seem to work at all. I've looked around the internet for several hours, and found pretty much exclusively posts about how to do this with a graphic calculator, or a program such as MATLAB. Obviously, this isn't much help, since I'm trying to implement it from scratch. I tried using some of the general forms available on the Wikipedia page for regression analysis, but they gave me an incredibly steep curve that was wildly off from the data presented. I'm sure I'm doing something wrong, but could someone help me to discover what exactly that is?

Best Answer

I'm not a statistician, but here is a simple least squares implementation that works for me:

Suppose you have data: $(t_1,y_1), \ldots (t_n,y_n)$. So $t_i$ is in each case the independent parameter and $y_i$ the (measured) value at such a data point. Construct a matrix: \begin{equation}A=\begin{bmatrix} t_1^3 & t_1^2 & t_1 & 1 \\ \vdots & \vdots & \vdots & \vdots \\ t_n^3 & t_n^2 & t_n & 1 \end{bmatrix}\end{equation} and let \begin{equation}y=\begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}. \end{equation} Then \begin{equation} x=(A^*A)^{-1}A^*y=\begin{bmatrix}a\\b\\c\\d \end{bmatrix} \end{equation} is the least squares solution: $y_0=at_0^3+bt_0^2+ct_0+d$. The error $E$ may be computed as $\|Ax-y\|^2$.

The source for my method is a textbook on linear algebra: Linear Algebra, 4th edition by Friedberg et al. p.360-364. It is really used in the textbook just as a demonstration of some practical applications in terms of the theory of inner product spaces, but ever since I came across this, and when I needed to do basic regression, this has worked well for me in the case of polynomial fits.

Related Question