Solved – Logistic regression with LBFGS solver

logisticmachine learning

Is there any open source library or code which implements Logistic Regression using L-BFGS solver?

I would prefer Python, but other languages are welcome, too.

Best Answer

Here is an example of logistic regression estimation using the limited memory BFGS [L-BFGS] optimization algorithm. I will be using the optimx function from the optimx library in R, and SciPy's scipy.optimize.fmin_l_bfgs_b in Python.

Python

The example that I am using is from Sheather (2009, pg. 264). The following Python code shows estimation of the logistic regression using the BFGS algorithm:

# load required libraries
import numpy as np
import scipy as sp
import scipy.optimize
import pandas as pd
import os

# hyperlink to data location
urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"

# read in the data to a NumPy array
arrSheatherData = np.asarray(pd.read_csv(urlSheatherData))

# slice the data to get the dependent variable
vY = arrSheatherData[:, 0].astype('float64')

# slice the data to get the matrix of predictor variables
mX = np.asarray(arrSheatherData[:, 2:]).astype('float64')

# add an intercept to the predictor variables
intercept = np.ones(mX.shape[0]).reshape(mX.shape[0], 1)
mX = np.concatenate((intercept, mX), axis = 1)

# the number of variables and obserations
iK = mX.shape[1]
iN = mX.shape[0]

# logistic transformation
def logit(mX, vBeta):
    return((np.exp(np.dot(mX, vBeta))/(1.0 + np.exp(np.dot(mX, vBeta)))))

# stable parametrisation of the cost function
def logLikelihoodLogitStable(vBeta, mX, vY):
    return(-(np.sum(vY*(np.dot(mX, vBeta) -
    np.log((1.0 + np.exp(np.dot(mX, vBeta))))) +
                    (1-vY)*(-np.log((1.0 + np.exp(np.dot(mX, vBeta))))))))

# score function
def likelihoodScore(vBeta, mX, vY):
    return(np.dot(mX.T,
                  (logit(mX, vBeta) - vY)))

#====================================================================
# optimize to get the MLE using the BFGS optimizer (numerical derivatives)
#====================================================================
optimLogitBFGS = sp.optimize.minimize(logLikelihoodLogitStable,
                                  x0 = np.array([10, 0.5, 0.1, -0.3, 0.1]),
                                    args = (mX, vY), method = 'BFGS',
                                    options={'gtol': 1e-3, 'disp': True})

print(optimLogitBFGS) # print the results of the optimisation

And this can easily be adapted to the scipy.optimize.fmin_l_bfgs_b function:

#====================================================================
# optimize to get the MLE using the L-BFGS optimizer (analytical derivatives)
#====================================================================
optimLogitLBFGS = sp.optimize.fmin_l_bfgs_b(logLikelihoodLogitStable,
                                  x0 = np.array([10, 0.5, 0.1, -0.3, 0.1]),
                                    args = (mX, vY), fprime = likelihoodScore,
                                    pgtol =  1e-3, disp = True)

print(optimLogitLBFGS) # print the results of the optimisation

R

Using the L-BFGS-B optimizer in R is just as simple. First the version with the BFGS algorithm:

library(optimx)

# read in the data
urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
dfSheatherData = as.data.frame(read.csv(urlSheatherData, header = T))

# create the design matrices
vY = as.matrix(dfSheatherData['InMichelin'])
mX = as.matrix(dfSheatherData[c('Service','Decor', 'Food', 'Price')])

# add an intercept to the predictor variables
mX = cbind(rep(1, nrow(mX)), mX)

# the number of variables and observations
iK = ncol(mX)
iN = nrow(mX)

# define the logistic transformation
logit = function(mX, vBeta) {
  return(exp(mX %*% vBeta)/(1+ exp(mX %*% vBeta)) )
}

# stable parametrisation of the log-likelihood function
# Note: The negative of the log-likelihood is being returned, since we will be
#       /minimising/ the function.
logLikelihoodLogitStable = function(vBeta, mX, vY) {
  return(-sum(
    vY*(mX %*% vBeta - log(1+exp(mX %*% vBeta)))
    + (1-vY)*(-log(1 + exp(mX %*% vBeta)))
  )  # sum
  )  # return
}

# score function
likelihoodScore = function(vBeta, mX, vY) {
  return(t(mX) %*% (logit(mX, vBeta) - vY) )
}

# initial set of parameters
vBeta0 = c(10, -0.1, -0.3, 0.001, 0.01)  # arbitrary starting parameters

#====================================================================
# optimize to get the MLE using the BFGS optimizer (numerical derivatives)
#====================================================================
optimLogitBFGS = optim(vBeta0, logLikelihoodLogitStable,
                    mX = mX, vY = vY, method = 'BFGS', hessian=TRUE)
optimLogitBFGS # get the results of the optimisation

and then the version with the L-BFGS-B from the optimx package:

#====================================================================
# optimize to get the MLE using the L-BFGS optimizer (analytical derivatives)
#====================================================================
optimLogitLBFGS = optimx(vBeta0, logLikelihoodLogitStable, method = 'L-BFGS-B',
                            gr = likelihoodScore, mX = mX, vY = vY, hessian=TRUE)

summary(optimLogitLBFGS)