Solved – Differences Between Logistic Regression in Statistics and in Machine Learning

logisticmachine learning

I just found out that machine learning also has logistic regression as one of its methods. Can someone please tell me the differences between logistic regression in statistics and machine learning? I've seen lecture slides on logistic regression from a machine learning course, but I can't see the difference with the coverage of logistic regression in a statistics course.

Does logistic regression in machine learning have no need to check for multicollinearity?

The reason I asked this is because I've tried to run a dataset through R's glm function with binomial logit, and then I ran the same dataset through Apache Mahout's trainlogistic. But the resulting coefficients are different.

This is the command I use in R:

w1.glm <- glm(anw ~ cs, data = w1, family = "binomial")

This is the result of summary(w1.glm):

glm(formula = anw ~ cs, family = "binomial", data = w1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5400   0.1073   0.1924   1.0047   1.0047  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.42077    0.02588   16.26   <2e-16 ***
cs           1.89342    0.06427   29.46   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 11762.5  on 10660  degrees of freedom
Residual deviance:  9250.3  on 10659  degrees of freedom

And this is the command I use in Mahout:

/usr/local/mahout/bin/mahout trainlogistic --input w1.csv --output ./model --target anw --categories 2 --predictors cs --types numeric --features 20 --passes 100 --rate 50

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.8-job.jar
20
anw ~ 
-19.553*cs + -7.512*Intercept Term
            cs -19.55265
      Intercept Term -7.51155
    0.000000000     0.000000000     0.000000000     0.000000000     0.000000000     0.000000000   -19.552646543     0.000000000     0.000000000     0.000000000     0.000000000     0.000000000     0.000000000    -7.511546797     0.000000000     0.000000000     0.000000000     0.000000000     0.000000000     0.000000000 
13/11/01 02:04:47 INFO driver.MahoutDriver: Program took 22118 ms (Minutes: 0.3686333333333333)

Edited: Added the reason I asked the question in the title. Added the commands used to execute glm in R and trainlogistic in Mahout.

Best Answer

Logistic regression refers to the same thing in both fields. It seems like Mahout does some things by default that make its implementation of logistic a little more than just logistic. First, Mahout seems to be regularizing the coefficients. If its doing this by default, I would also expect it to be standardizing (scaling and centering) the inputs. Passing it a value of lambda=0 should prevent regularization, but you still have to make sure that the inputs are not being standardized.

If you want to do regularized GLM in R check out the glmnet package.

Related Question