Solved – Polynomial Regression using Gradient Descent for approximation of a sine in python

gradient descentpolynomialpythonregression

According to answers of this topic: Polynomial regression using gradient descent I've written a code for polynomial regression of this data: http://illumina.ir/ML/data1.xlsx (the output is simply a sine function).

import pandas 
import numpy as np
import matplotlib.pyplot as plt

def gradientDescent(x, y, theta, alpha, n, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        cost = np.sum(loss ** 2) / (2 * n)
        #print("Iteration %d | Cost: %f" % (i, cost))
        gradient = np.dot(xTrans, loss) / n
        theta = theta - alpha * gradient
    return theta

ExcelData = pandas.read_excel('data1.xlsx', 'Sheet1')

xs=ExcelData[0]
#xs = xs / np.linalg.norm(xs) #normalize input examples
xs = xs / xs[99] #normalize input examples
ys=ExcelData['0.1']

m=7 #polynomial degree
n=len(ys) #number of examples
numIterations= 100000 #number of iterations
alpha = 0.01

y = np.zeros(shape=n)
x = np.zeros(shape=(n, m))
theta = np.ones(m)

for i in range(0, n):
    y[i] = ys[i]

for i in range(0, n):
    for j in range(0, m):
        x[i][j] = np.power(xs[i], j)

theta = gradientDescent(x, y, theta, alpha, n, numIterations)
print(theta)

plt.clf()
plt.plot(y)
plt.plot(np.dot(x, theta))
plt.savefig('testplot.png')

Here is the output:

sample image

The blue curve is the expected output and the orange curve is the approximation.

What's the problem with my code?

Best Answer

The gradient descent code looks fine to me (although you should not fix the number of steps but rather exiting when the gradient is small).

The issue is that the gradient descent converges to a local minimum, which is not necessarily the global minimum. The key points are that

(i) the polynomial fitting converges VERY slowly to the sin function. The 7th order approximation does not converge outside the first period for example (see http://www.wolframalpha.com/input/?i=Taylor+sin+x) and the gradient descent method will not be able to find a much better solution (it requires a higher order polynomial);

(ii) the coefficients of the Taylor series centered in 0.5 have very different magnitudes. You should find theta = [-197, +2766, -15877, +47085, -75701, +62590, -20863, +0]. Supposedly, the same applies for the coefficients of the global minimum of the gradient descent method*. Starting from a guesses of all ones is a very poor initial condition, you are too far away from the actual solution.

Hope this helps

*NB: the Taylor series coefficients are NOT the same coefficients that the gradient descent method should fine. The Taylor series best approximate the function close to the expansion point, with the error increasing away from it; in the least square fitting that underlies the gradient descent tries to minimize the overall error, there is no "favorite point" unless you use weighted sums.)

Related Solutions

Solved – Using gradient ascent instead of gradient descent for logistic regression

Gradient descent and gradient ascent are the same algorithm. More precisely

Gradient ascent applied to $f(x)$, starting at $x_0$

is the same as

Gradient descent applied to $-f(x)$, starting at $x_0$.

This is true in the sense that gradient ascent in the first case and gradient descent in the second generate the same sequence of points, the first converges if and only if the second converges, and in case they both converge, they both converge to the same place.

For logistic regression, the cost function is

$$ \pm \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $$

you get to choose one of these two options, it doesn't matter which, as long as you are consistent.

Since $p_i$ is between zero and one, $\log(p_i)$ is negative, hence

$ \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $ is always negative.

Further, by letting $p_i \rightarrow 0$ for a point with $y_i = 1$, we can drive this cost function all the way to $- \infty$ (which can also be accomplished by lettinf $p_i \rightarrow 1$ for a point with $y_i = 0$. So this cost function has the shape of an upside-down bowl, hence it should be maximized, using gradient ascent.

If we use the negative of this cost function

$ - \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $ is always positive.

We can get exactly the opposite results (we can force it to $+ \infty$). So this is a rightside up bowl, and we should use gradient descent to minimize it.

Solved – Correct way to use polynomial regression in Python

Here's how I go about it in pure sklearn. There are probably ways to improve this workflow.

First I made a tranformer that simply selects one column from a DataFrame or matrix:

class ColumnSelector(object):

    def __init__(self, idxs):
        self.idxs = np.asarray(idxs)

    # Fit here doesn't need to do anything.  We already know the indices of the columns
    # we want to keep.
    def fit(self, *args, **kwargs):
        return self

    def transform(self, X, **transform_params):
        # Need to teat pandas data frames and numpy arrays slightly differently.
        if isinstance(X, pd.DataFrame):
            return X.iloc[:, self.idxs]
        return X[:, self.idxs]

Then I made a PolynomialExpansion class

class PolynomialExpansion(object):

    def __init__(self, degree):
        self.degree = degree

    def fit(self, *args, **kwargs):
        return self

    def transform(self, X, **transform_params):
        # Initialize our return value as a matrix of all zeros.
        # We are going to overwrite all of these zeros in the code below.
        X_poly = np.zeros((X.shape[0], self.degree))
        # The first column in our transformed matrix is just the vector we started with.
        X_poly[:, 0] = X.squeeze()
        # Cleverness Alert:
        # We create the subsequent columns by multiplying the most recently created column
        # by X.  This creates the sequence X -> X^2 -> X^3 -> etc...
        for i in range(1, self.degree):
            X_poly[:, i] = X_poly[:, i-1] * X.squeeze()
        return X_poly

Then to use it, I combine it with Pipelines and FeatureUnions. This pipeline uses the arsenic data from Gelman and Hill

wells_pipeline = Pipeline([
    ('polynomial_expansions', FeatureUnion([
        ('arsenic_quadratic', Pipeline([
            ('arsenic_selector', ColumnSelector([0])),
            ('quadratic_expansion', PolynomialExpansion(2))
        ])),
        ('distance_quadratic', Pipeline([
            ('distance_selector', ColumnSelector([1])),
            ('quadratic_expansion', PolynomialExpansion(2))         
        ]))
    ])),
    ('regression', LogisticRegression())
])

Another option is to use patsy, which allows you to specify model formulas, as in R.

Best Answer

Related Solutions

Solved – Using gradient ascent instead of gradient descent for logistic regression

Solved – Correct way to use polynomial regression in Python

Related Question