Should Python's
statsmodels.api.GLM(train_y, train_X, family=sm.families.Binomial()).fit().predict(test_X)
always produce the same results as R's
predict(glm(y ~ ., data=train_X, family=binomial), newdata=test)
where train_y
is a pandas DataFrame
containing the y
column in the corresponding R data.frame
, train
; and where test_X
and train_X
are dataframes containing the remaining columns from the test
and train
dataframes respectively?
If not, are there parameters that I can supply to statsmodels
's GLM
to make it produce the same results as R's glm
?
Best Answer
Yes, they should give the same answers if you fit the same model. Compare
R code
Python code
If there's a convergence issue here, I wouldn't trust either answer without knowing why there are convergence issues. I'd be interested to have a look at some data that can reproduce these convergence failures in R.