[Math] Fitting 2nd Order multivariate quadratic with matrices

matricesregression

Hopefully you at least entertain this question as it took forever to construct the below matrix using TeX. Any ways, so I have a list of data points ($X_1$,$X_2$,Y), with the X's being independent variables and Y being dependent. What I am attempting to find is the best fit multivariate quadratic using sum of the least squares.

The general 2nd order polynomial I am attempting to find is in the form of:
$$
y = B_0 + B_1 X_1 + B_2 X_2 + B_{11} X_1^2 + B_{22} X_2^2 + B_{12}X_1X_2
$$

1) Can anyone tell me if the below matrix is correct? In comparison, a traditional uni-variate quadratic fit can be seen on page 9.

$$
\begin{vmatrix}
n & \sum_{} X_1 & \sum_{} X_2 & \sum_{} X_1 X_2 & \sum_{} X_1^2 & \sum_{} X_2^2 \\
\sum X_1 & \sum_{} X_1^2 & \sum_{} X_1 X_2 & \sum_{} X_1^2X_2 & \sum_{} X_1^3 & \sum_{} X_1 X_2^2 \\
\sum X_2 & \sum_{} X_1 X_2 & \sum_{} X_2^2 & \sum_{} X_1 X_2^2 & \sum_{} X_1^2 X_2 & \sum_{} X_2^3 \\
\sum X_1 X_2 & \sum_{} X_1^2 X_2 & \sum_{} X_1 X_2^2 & \sum_{} X_1^2 X_2^2 & \sum_{} X_1^3 X_2 & \sum_{} X_1 X_2^3 \\
\sum X_1^2 & \sum_{} X_1^3 & \sum_{} X_1^2 X_2 & \sum_{} X_1^3X_2 & \sum_{} X_1^4 & \sum_{} X_1^2 X_2^2 \\
\sum X_2^2 & \sum_{} X_1 X_2^2 & \sum_{} X_2^3 & \sum_{} X_1 X_2^3 & \sum_{} X_1^2X_2^2 & \sum_{} X_2^4 \\
\end{vmatrix}
\begin{vmatrix}
B_0\\
B_1\\
B_2\\
B_{12}\\
B_{11}\\
B_{22}\\
\end{vmatrix}
=
\begin{vmatrix}
\sum_{} Y_i\\
\sum_{} X_1 Y_i\\
\sum_{} X_2 Y_i\\
\sum_{} X_1 X_2 Y_i\\
\sum_{} X_1^2 Y_i\\
\sum_{} X_2^2 Y_i\\
\end{vmatrix}
$$

2) When I solve for the $B_{12}$ which is the $X_1X_2$ coefficient, if that is ~ 0 or very small, then that means there is very little interaction between X & Y?

Best Answer

Yes, your derivation is correct.

$B_{12}$ being small (compared to $B_{11}$ and $B_{22}$) indeed indicates that the axis of the paraboloid are close to the coordinate axis.

Related Solutions

Multivariate Quadratic Regression, Surface Fitting and the Hessian

The basis for two dimensions would be

$$ p(x) = \left[\begin{array}{ccccc} x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 \end{array}\right]^T $$

$f(x) = p(x)^T a$

where $a = [a_0, a_1, ...]$. This would be akin to calculating

$$ f(x) = a_0 + a_1 x_1 + a_2 x_2 + a_3 x_1 x_2 + a_4 x_2^2 $$

Then with given data points

$$ \left[\begin{array}{ccccc} (x_1^1) & (x_2^1) & (x_1^1)^2 & (x_1^1)(x_2^1) & (x_2^1)^2 \\ \vdots & & & & \vdots \\ (x_1^n) & (x_2^n) & (x_1^n)^2 & (x_1^n)(x_2^n) & (x_2^n)^2 \\ \end{array}\right] \mathbf{a} = \left[\begin{array}{c} y^1 \\ \vdots \\ y^n \end{array}\right] $$

and solve for $a$.

But I need the

$$ f(x) = x^T A x $$

form, so I can easily get $\nabla f (x) = 2 A x$ and the Hessian $\nabla^2 f(x) = 2 A$.

It turns out if we start with 3 dimensions, and use $x_3 = 1$ this gives the necessary form in two dimensions,

$$ x^T A x = \left[\begin{array}{ccc} x_1 & x_2 & x_3 \end{array}\right] \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right] \left[\begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array}\right] $$

$$ = \left[\begin{array}{c} x_1 a_{11} + x_2 a_{21} + x_3 a_{31} \\ x_1 a_{12} + x_2 a_{22} + x_3 a_{32} \\ x_1 a_{13} + x_2 a_{23} + x_3 a_{33} \end{array}\right]^T \left[\begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array}\right] $$

$$ = x_1 x_1 a_{11} + x_1 x_2 a_{21} + x_1 x_3 a_{31} + $$ $$ x_1 x_2 a_{12} + x_2 x_2 a_{22} + x_3 x_2 a_{32} + $$ $$ x_1 x_3 a_{13} + x_2 x_2 a_{23} + x_3 x_3 a_{33} $$

Using $x_3 = 1$

$$ x^T A x = \left[\begin{array}{ccc} x_1 & x_2 & 1 \end{array}\right] \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right] \left[\begin{array}{c} x_1 \\ x_2 \\ 1 \end{array}\right] $$

$$ = a_{11} x_1^2 + a_{21} x_1 x_2 + a_{31}x_1 + $$ $$ a_{12}x_1 x_2 + a_{22}x_2^2 + a_{32} x_2+ $$ $$ a_{13} x_1 + a_{23} x_2 x_3 + a_{33} $$

from scipy.interpolate import Rbf
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import autograd.numpy as anp
import autograd

def random_ball(num_points, dimension, radius=1):
    from numpy import random, linalg
    random_directions = random.normal(size=(dimension,num_points))
    random_directions /= linalg.norm(random_directions, axis=0)
    random_radii = random.random(num_points) ** (1/dimension)
    return radius * (random_directions * random_radii).T

np.random.seed(0)
N = 20

def rosenbrock(x):
    return (1 + x[0])**2 + 100*(x[1] - x[0]**2)**2

def Rosenbrock(x,y):
    return (1 + x)**2 + 100*(y - x**2)**2

def get_fvals_in_region(xcurr, f, radius):    
    b = random_ball(N, 2, radius)
    pts = xcurr+b
    vals = [f(p) for p in pts]
    return xcurr+b, np.array(vals)

x0 = [1.5,0]
xs,vs = get_fvals_in_region(x0, rosenbrock, 0.5)

res = []
for i in range(vs.shape[0]):
    res.append((xs[i,0],xs[i,1],vs[i]))
res = np.array(res).reshape(vs.shape[0], 3)

x = np.linspace(-2,2,250)
y = np.linspace(-1,3,250)
X, Y = np.meshgrid(x, y)
Z = Rosenbrock(X, Y)

import itertools
import numpy.linalg as lin

def quad_interpolate(xi, yi):
    xi = np.hstack((xi, np.ones((1,len(xi))).T  ))
    #print (xi)
    D = xi.shape[1]
    print (D)
    X_train = []
    for row in xi:
        X_train.append([row[i]*row[j] for i,j in itertools.product(range(D),range(D)) ])
    X_train = np.array(X_train)
    print (X_train.shape)
    print (yi.shape)
    coef,_,_,_ = lin.lstsq(X_train, yi)
    return coef

xi = res[:,[0,1]]
yi = res[:,[2]]
coef = quad_interpolate(xi,yi)

print (coefs)

x = np.linspace(-2,2,250)
y = np.linspace(-1,3,250)
X, Y = np.meshgrid(x, y)
Z = Rosenbrock(X, Y)

fig = plt.figure(figsize = (8,4))
ax = fig.gca(projection='3d')
ax.plot3D(res[:,0],res[:,1],res[:,2],'r.')
ax.plot_surface(X,Y,Z,rstride = 5, cstride = 5, cmap = 'jet', alpha = .4, edgecolor = 'none' )

def q_interp(x1,x2):
    x = np.array([[x1,x2,1]])
    A = coef.reshape(3,3)
    res = np.dot(np.dot(x,A),x.T)
    return np.float(res)

Zi = np.array([q_interp(xx,yy) for xx,yy in zip(X.flatten(),Y.flatten())])
Zi = Zi.reshape(X.shape)
ax.plot_wireframe(X,Y,Zi)

coefs = coef.reshape(3,3)

g = (2 * np.dot(coefs[:2,:2],np.array(x0).reshape(2,1)))

gnorm = g / np.sum(g)

ax.set_zlim(0,2500)

ax.quiver(x0[0], x0[1], 0, -gnorm[0], -gnorm[1], 0, color='red')

hess = 2*coefs[:2,:2]
print (hess)
newton_dir = -np.dot(lin.inv(hess),g)
print (newton_dir)

d = newton_dir
print (d)

ax.quiver(x0[0], x0[1], 0, d[0], d[1], 0, color='green')

ax.plot3D([x0[0]], [x0[1]], [0.0], 'b.')

ax.view_init(21, -133)

Curve fitting with polynomials and inner products

Per your edit: In your method, each data point gives a row to $A$. If you have a lot more data points, then you have a lot more rows. If $A$ has more rows than columns, then $A\vec c = \vec b$ is unlikely to have any solution, as it is overly constrained. You can solve this by increasing the maximum degree of your polynomial until you have at least as many columns as rows again. But you've given up the ability to limit the degree of your answer, and as mentioned in my original answer, that leads to wildly oscillating polynomials.

Note that in your professor's version, the size of $A$ is not dependent on the number of data points. Only the definition of the inner products change with that. $A$ is always a square matrix. And since the polynomials themselves are chosen to be independent of each other, you are guaranteed that $A\vec c = \vec b$ has at least one solution, even when $A$ is not invertible.

Also, you can prove that the professor's version comes up with a solution that minimizes squared error. Your method almost certainly will not.

Original Post:

How you produce a best-fit polynomial depends on your definition of "best-fit". Your example defines it as "exact match at these four points", and cares not the slightest how the polynomial behaves anywhere else. Such polynomials have a tendency to swing wildly between points. For functions of one variable that are almost linear the interpolating polynomial will generally end up looking sinusoidal over the range of interest instead of the nice flat line you would want.

How your professor's method defines "best-fit" is not clear, because you did not indicate which inner product of polynomials is being used. (And it has been ages since I've dealt with this, so I'd probably have to think hard about what it was doing before I could answer anyway.)

Best Answer

Related Solutions

Multivariate Quadratic Regression, Surface Fitting and the Hessian

Curve fitting with polynomials and inner products

Related Question