When I fit a polynomial on the Boston
data set with R, I seem to get different results than when I use Python. Example code with R:
library(MASS)
attach(Boston)
lm.fit = lm(nox ~ poly(dis, 3), data = Boston)
summary(lm.fit)
This yields the coefficients
Coefficients:
Estimate
(Intercept) 0.554695
poly(dis, 3)1 -2.003096
poly(dis, 3)2 0.856330
poly(dis, 3)3 -0.318049
With Python:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn import datasets
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
boston = datasets.load_boston()
df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
df = pd.concat([df, pd.Series(boston['target'], name='MEDV')], axis=1)
df_x = df[['DIS']]
df_y = df[['NOX']]
poly = PolynomialFeatures(3)
df_x_transform = poly.fit_transform(df_x)
lin_regressor = LinearRegression()
lin_regressor.fit(df_x_transform, df_y)
print(lin_regressor.intercept_, lin_regressor.coef_)
Yields:
0.9341280720211879 [ 0. -0.18208169 0.02192766 -0.000885 ]
And with statsmodels
:
model = sm.OLS(df_y, df_x_transform)
fitted = model.fit()
print(fitted.summary())
We get the same result as with sklearn
:
coef
const 0.9341
x1 -0.1821
x2 0.0219
x3 -0.0009
How is this possible?
Best Answer
R uses an orthogonal basis expansion while PolynomialFeautres does not. Try passing
raw=TRUE
inpoly
. What are the results?