Write a possibly cubic equation given a set of coordinates

coordinate systemslearning

Given a specific set of coordinates, I'm trying to write an equation that will describe their curve.

By graphing the coordinates, I believe it should be a cubic equation. But there is only one root at $(0,0)$.

The coordinates are as follows: $(0,0)$, $(2.1258,15)$, $(4.5809,30)$, $(7.8878,45)$, $(13.6083,60)$, $(27.2268,75)$, $(57.1741,85)$, $(90,90)$, $(121.4014,95)$, $(152.3215,105)$, $(166.4417,120)$, $(177.8625,165)$, $(175.4013,150)$, $(180,180)$

I have been reading and watching video tutorials about cubic equations but they're all either given the equation or given multiple roots.

The closest equation I've been able to write (guess, really) is as follows, but it doesn't quite follow the curve. $$p(x)=0.0001(x-90)^3+90$$

I am not looking for you to tell me the equation, I want to learn how to write them. Any help is greatly appreciated.

Edit below:

Perhaps it would be relevant to mention that there are limits to the Y value, when $x<90$ then $y<90$, the opposite is also true.
Not sure if this changes anything, though.

Best Answer

The least square fitted polynomials below are produced by (essentially) the matrix computation described at polynomial regression.

This data is not actually produced by a cubic. The procedure in Allain Remillard's answer can be used to show this. The cubic of best fit (least summed square errors) is shown with the data:

$$ 11.8893 + 3.33669 x - 0.041162 x^2 + 0.000152843 x^3 $$

Mathematica graphics

The least squares fitted quintic gets closer. $$ 2.99379 + 6.11504 x - 0.173352 x^2 + 0.00225902 x^3 - 0.0000134475 x^4 + 2.98074*10^{-8} x^5 $$

Mathematica graphics

But the tiny coefficients and alternating signs (also present in the cubic), together with the shallow "wave" in the middle suggests we are on the threshold of (or have just passed) overfitting. It seems likely that the data do not follow any polynomial.

The Lagrange interpolating polynomial is guaranteed to pass through every point, but has degree $13$, so is very likely overfitted. (If there were a polynomial of lower degree that would pass through the points, cancellation in the product for finding the Lagrange polynomial would reduce the degree.) We expect to be overfitted, so we expect large deviations in the gaps between the data points.

Mathematica graphics

You face a choice now. Do you allow more complicated models, hoping to find one that can produce your data, or do you settle for a simple model because it captures the features you know you want to capture? The answer to this question depends strongly on what you intend to do with the model fit, which is not in the scope of the Question as asked.