My guess is that you have accidentally transformed y_train (somewhere hidden in the code you have not posted). This because this reproducible snippets works
import numpy as np
import pandas as pd
import math
from sklearn import preprocessing
dat = pd.read_csv("/home/steffen/workspaces/airfoil/airfoil_self_noise.dat",sep="\t",low_memory=False,header=None)
apply_scaler = True
# split into train 2/3 and test 1/3
rng = np.random.RandomState(42)
n_rows = dat.shape[0]
n_train = math.floor(0.66*n_rows)
permutated_indices = rng.permutation(n_rows)
train_dat = dat.loc[permutated_indices[:n_train],:]
test_dat = dat.loc[permutated_indices[n_train:],:]
# separate the response variable (last column) from the predictor variables
x_train = train_dat.iloc[:,1:-1]
y_train = (train_dat.iloc[:,-1])[:, np.newaxis]
x_test = test_dat.iloc[:,1:-1]
y_test = (test_dat.iloc[:,-1])[:, np.newaxis]
# train
# fit the scaler to predictor variables and apply it afterwards
scaler = preprocessing.StandardScaler().fit(x_train)
if apply_scaler:
x_train = pd.DataFrame(scaler.transform(x_train))
# add constant one for the intercept parameter
x_train = pd.concat([pd.DataFrame(np.ones(shape=(x_train.shape[0],1)),index=x_train.index),x_train],axis=1)
# fit parameters of linear regression using batch gradient descent
# Hands-On Machine Learning with Scikit-Learn & Tensorflow, page 115
eta = 0.1 # learning rate
n_iterations = 1000
m = x_train.shape[0]
theta = rng.randn(x_train.shape[1],1)
for iteration in range(n_iterations):
gradients = (2 / m) * x_train.T.dot(x_train.dot(theta) - y_train)
theta = theta - eta * gradients
# to apply the fitted parameters, first we have to transform the test-data in the same way
# apply scaler
if apply_scaler:
x_test = pd.DataFrame(scaler.transform(x_test))
# add constant one for the intercept parameter
x_test = pd.concat([pd.DataFrame(np.ones(shape=(x_test.shape[0],1)),index=x_test.index),x_test],axis=1)
# apply fitted parameters
y_predict =x_test.dot(theta)
# compare output
out=np.column_stack((y_test, y_predict))
print(pd.DataFrame(out).head())
# root mean squared error
print("error %f"% np.sqrt(np.power(y_test-y_predict,2).mean()))
This leads to this output
0 1
0 120.573 127.108268
1 127.220 123.492931
2 113.045 122.393120
3 119.606 122.570836
4 131.971 127.270743
error 6.175637
which is fine.
It is interesting to see that for learning rate 0.1 this simple batch gradient descent implementation fails to converge if no normalization is performed (apply_scaler=False, eta=0.1), while the Linear Regression implementation of scikit learn still finds a solution. Reducing the learning rate dramatically (eta=0.0001) leads to convergence again.
This is one example where the Gradient Descent is limited, as discussed here: Do we need gradient descent to find the coefficients of a linear regression model.
Embarked and Survived are perfectly correlated to be exact. Drop it.
To see this:
df = pd.read_csv("test.csv")
df = df.drop(['PassengerId','Ticket','Fare','Cabin','Name'], axis=1)
df['Age'] = df['Age'].fillna(value=29)
df['Embarked'] = df.fillna('C')
#y = df['Survived']
#X = df.drop(['Survived'],axis=1)
X = df.copy()
X.loc[X['SibSp'] >= 2, 'SibSp'] = 2
X.loc[X['Parch'] >= 3, 'Parch'] = 3
X.loc[X['Age'] < 15, 'Age'] = 0
X.loc[(X['Age'] >= 15) & (X['Age'] < 60), 'Age'] = 1
X.loc[X['Age'] >= 60, 'Age'] = 2
X = pd.get_dummies(X,columns=['Embarked'])
X = pd.get_dummies(X,columns=['Pclass'])
X = pd.get_dummies(X,columns=['Sex'])
print(X.corr())
Or another way to see this would be
df = pd.read_csv("test.csv")
df = df.drop(['PassengerId','Ticket','Fare','Cabin','Name'], axis=1)
df['Age'] = df['Age'].fillna(value=29)
df['Embarked'] = df.fillna('C')
df['S==E'] = df.Survived == df.Embarked
print(df.loc[df['S==E'] == False])
Best Answer
Here is a 3D surface fitter using your equation and my test data that makes a 3D scatter plot, a 3D surface plot, and a contour plot. You should be able to click-drag the 3D plots with the mouse and rotate them in 3-space for visual inspection.