Logistic Regression – Why Logistic Regression Sometimes Outperforms Neural Networks

ginilogisticneural networksregression

I have 5 samples (each one contains ~380K records, 33 predictive variables and 1 binary Target):

  • one sample is used to train the models
  • the remaining 4 samples are used to validate the models

The following table compares the Gini's of the Logistic Regression against the Gini's of the Multilayer Perceptron (MLP) :

Logistic Regression MLP
Train sample 35.8 34.9
validation sample 1 40.0 34.4
validation sample 2 37.7 32.0
validation sample 3 37.5 31.5
validation sample 4 36.4 34.2

As you can see, the Gini's of the Logistic Regression are consistently higher than the Gini's of the MLP.

Why could that be?

Before running both the Logistic Regression and the MLP I have categorized the categorical variables and also scaled the numeric variables.

The code of the Logistic Regression is really simple and straightfoward:

Y=data['Target']  # this is the target
X=data[col_list]  # this is the list of 33 predictive features


X1=sm.add_constant(X)
   
logit=sm.Logit(Y,X1)
result=logit.fit()
print(result.summary())

The code of the MLP is this one:

def build_model():
    model = Sequential()
    model.add(Dense(5, input_dim=33, activation='relu'))
    model.add(Dense(5, activation='sigmoid'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

   
model = build_model()
model.fit(X, Y, epochs=4, batch_size=30, verbose=1)  # X=predictive features ; Y = target 

I don't understand why the MLP underperforms the Logistic Regression.

Best Answer

If the response conditioned on your predictive roughly follows a logistic curve then logistic regression will be superior. Despite ML hype, DL/NN do not always outperform simpler models.

Have you examined the output of the logistic model? What do the residuals look like? I believe this implementation automatically includes an L2 penalty which probably helps given the number of predictors.

Not sure what categorized the categorical variable means. If it's something like one-hot encoding that is not recommended for regression, the model implementation should automatically handle it.