Data Analysis – Polynomial Fitting with Monotonically Increasing Polynomial

data analysisinterpolationpolynomialsregression

Given a set of monotonically increasing data points (in 2D), I want to fit a polynomial to the data which is monotonically increasing over the domain of the data. If the highest x value is 100, I don't care what the slope of the polynomial is at x=101.

I'd also like to keep the degree of the polynomial below 7. My data usually has about 20 (x,y) pairs. Least squares fitting is probably best, but I'm open to other techniques.

I was thinking that maybe I could create a b-spline, sample that b-spline (generating, say, another 100 data points), then perform a polynomial fit. Does that approach sound sound?

Best Answer

I think the problem is not precisely stated: its unclear what "minimize the degree of the polynomial, while fitting fairly tightly to the data" means (what is "fairly tightly?").

Anyway, for fixed degree, this may be formulated as a semidefinite program which can be solved approximately in polynomial time; there is a lot of software out there that can solve semidefinite programs very efficiently, e.g., sedumi.

Indeed, suppose your data points are $(x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)$. Let $y$ be the vector that stacks up the $y_i$ and let $V$ be the Vandermonde matrix $$V = \begin{pmatrix} 1 & x_1 & x_1^2 & \cdots& x_1^n \\ 1 & x_2 & x_2^2 & \cdots & x_2^n \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^n \end{pmatrix}$$ Then, you'd like the minimize $||y-Vp||_2^2$ subject to the constraint that the entries of the vector $p$, which we will call $p_i$, correspond to a monotonic polynomial on your domain $[a,b]$, or, in other words, $p'(x) \geq 0$ on $[a,b]$: $$ p_1 + 2p_1 x + 3 p_2^2 + \cdots n p_n x^{n-1} \geq 0 ~~~ \mbox{ for all } x \in [a,b].$$ This last constraint can be written as a semidefinite program,as outlined in these lecture notes. I will briefly outline the idea.

A univariate polynomial which is nonnegative has a representation as a sum of squares of polynomials. In particular, if $p'(x) \geq 0$ on $[a,b]$, and its degree $d$ is, say, even, then it can be written as $$p'(x) = s(x) + (x-a)(b-x) t(x),~~~~~~~~(1)$$ where $s(x), t(x)$ are sums of squares of polynomials (this is Theorem 6 in the above-linked lecture notes; a similar formula is available for odd degree). The condition that a polynomial $s(x)$ is a sum of squares is equivalent to saying there is a nonnegative definite matrix $Q$ such that $$ s(x) = \begin{pmatrix} 1 & x & \cdots x^{d/2} \end{pmatrix} Q \begin{pmatrix} 1 \\ x \\ x^2 \\ \vdots \\ x^{d/2} \end{pmatrix}.$$ This is Lemma 3 in the same lecture notes.

Putting it all together, what we optimize are the entries of the matrices $Q_1,Q_2$ that make the polynomials $s(x)=x^T Q_1 x,t(x)=x^T Q_2 x$ sums of squares, which means imposing the constraints $Q_1 \geq 0, Q_2 \geq 0$. Then Eq. (1) is a linear constraint on the entries of the matrices $Q_1,Q_2$. This gives you you have a semidefinite program you can input to your SDP solver.