Solved – Polynomial regression seems to give different coefficients depending on Python or R

polynomialpythonrregression

When I fit a polynomial on the Boston data set with R, I seem to get different results than when I use Python. Example code with R:

library(MASS)
attach(Boston)
lm.fit = lm(nox ~ poly(dis, 3), data = Boston)
summary(lm.fit)

This yields the coefficients

Coefficients:
               Estimate
(Intercept)    0.554695
poly(dis, 3)1 -2.003096
poly(dis, 3)2  0.856330
poly(dis, 3)3 -0.318049

With Python:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn import datasets
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

boston = datasets.load_boston()
df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
df = pd.concat([df, pd.Series(boston['target'], name='MEDV')], axis=1)
df_x = df[['DIS']]
df_y = df[['NOX']]

poly = PolynomialFeatures(3)
df_x_transform = poly.fit_transform(df_x)

lin_regressor = LinearRegression()
lin_regressor.fit(df_x_transform, df_y) 
print(lin_regressor.intercept_, lin_regressor.coef_)

Yields:

0.9341280720211879 [ 0.         -0.18208169  0.02192766 -0.000885  ]

And with statsmodels:

model = sm.OLS(df_y, df_x_transform)
fitted = model.fit()
print(fitted.summary())

We get the same result as with sklearn:

                 coef
const          0.9341
x1            -0.1821
x2             0.0219
x3            -0.0009

How is this possible?

Best Answer

R uses an orthogonal basis expansion while PolynomialFeautres does not. Try passing raw=TRUE in poly. What are the results?

Related Question