R Logistic Regression – How to Code the Likelihood Function for Logistic Regression

computational-statisticslikelihoodlogisticr

I would appreciate help in understanding if I made a correct interpretation and coding of the likelihood function for logistic regression.

Background: For a task I am going to write a function in R of the likelihood function for a logistic regression model. It has been given that $p(x_i)=\frac{1}{1+\exp(-x_i\beta)}$ where $\beta=(\beta_1,…,\beta_k)$ and $x_i=(x_{i,1},…,x_{i,k})$ and that the design matrix $\mathbf{x}$ is an $N \times k$ matrix.

My understanding: I did some research and found that the likelihood function for a logistic regression is given by $L(\beta)=\prod p_i(\beta)^{y_i}(1-p_i(\beta))^{1-y_i}$ so I wrote the following code for this formula:

L <- function(beta,y,X){
  p <- numeric()
  for(i in 1:nrow(X)) {             
    p <- c(p, 1/(1+exp(as.vector(-X[i,])%*%beta)))
  }
  return(prod((p^y)%*%(1-p)^(1-y)))
  }

I first retrieved all p the values from the rows in the matrix X with the help of a for loop, and saved these in p. This allowed me to then insert all the vectors into the likelihood function given above. Observe that $\mathbf{y}=(y_1,…,y_N)^T$ and that $\mathbf{p}=(p_1,…,p_N)^T.$

Question: I have no way to check if my function is correct, and have based my code for likelihood in logistic regression from the course literature, so I would really appreciate help with whether my interpretation is correct. Is it true that a logistic regression has the given likelihood function? And is my code a correct way to implement this?

Best Answer

You need to do probability computations like this in log-space

Your computation is using a naïve method that is not how we code likelihood functions (or other probability functions) in practice. When undertaking computing with likelihood functions, it is much better to work in log-space to avoid arithmetic underflow problems and problems with numerical instability. This requires you to deal exclusively with log-probabilities in all your intermediate steps and then transform back to get the likelihood function only at the end of your computation.

To do this, you need to compute the log-likelihood function using log-probabilities in all the intermediate calculations. The log-likelihood function for the logistic regression is:

$$\ell_\mathbf{\mathbf{x},\mathbf{y}}(\boldsymbol{\beta}) = \sum_{i=1}^n [y_i \log(p_i(\boldsymbol{\beta})) + (1-y_i) \log(1-p_i(\boldsymbol{\beta}))],$$

where we can write the log-probabilities in this expression (using the hyperbolic function log1pexp) as:

$$\begin{align} \log(p_i(\boldsymbol{\beta})) &= \log \Bigg( \frac{1}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})} \Bigg) \\[6pt] &= -\log \Bigg( 1 + \exp(-\mathbf{x}_i \boldsymbol{\beta}) \Bigg) \\[12pt] &= -\text{log1pexp} (- \mathbf{x}_i \boldsymbol{\beta}), \\[12pt] \log(1-p_i(\boldsymbol{\beta})) &= \log \Bigg( 1 - \frac{1}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})} \Bigg) \\[6pt] &= \log \Bigg( \frac{\exp(-\mathbf{x}_i \boldsymbol{\beta})}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})} \Bigg) \\[6pt] &= -\mathbf{x}_i \boldsymbol{\beta} - \log \Bigg( 1 + \exp(-\mathbf{x}_i \boldsymbol{\beta}) \Bigg) \\[12pt] &= -\mathbf{x}_i \boldsymbol{\beta} -\text{log1pexp} (- \mathbf{x}_i \boldsymbol{\beta}). \\[6pt] \end{align}$$

This allows us to write the entire model in log-space as:

$$\ell_\mathbf{\mathbf{x},\mathbf{y}}(\boldsymbol{\beta}) = - \sum_{i=1}^n [\text{log1pexp} (- \mathbf{x}_i \boldsymbol{\beta}) + (1-y_i) \mathbf{x}_i \boldsymbol{\beta}].$$

Below I have knocked up an R function that computes the likelihood function for logistic regression using log-space computation. This method of programming the function will be far superior to what you have in your question, since it will avoid the arithmetic underflow that occurs when you multiply lots of small probabilities. In this function I have included a logical option loglike to allow the user to specify whether they want to return the likelihood function or log-likelihood function; observe that even if the user asks for the likelihood function, all intermediate computation is done in log-space and we only transform back to the likelihood function in the last step.

likelihood.logistic.regression <- function(beta, y, x, loglike = FALSE) {
  
  #Get parameters
  n <- length(y)
  
  #Compute the log-likelihood
  TERMS <- rep(-Inf, n)
  for (i in 1:n) {
    XB <- sum(x[i, ]*beta)
    TERMS[i] <- - VGAM::log1pexp(-XB) - (1-y[i])*XB }
  LOGLIKE <- sum(TERMS)
  
  #Return the output
  if (loglike) { LOGLIKE } else { exp(LOGLIKE) } }

As pointed out in the comments, one way to sanity-check the code is to simulate data from a logistic regression and then use the log-likelihood function in an optimiser to compute the MLE of the parameter vector. If your function is written correctly (and if all other functions you are using in your analysis are written correctly) then the MLE you compute from a large simulation ought to be reasonably close to the true parameter values used to generate the simulation. I will leave this as an exercise for you.

Related Solutions

Solved – MATLAB implementation of MLE for Logistic Regression

It should be pretty straightforward to code:

function llik = fun(b, X, Y)  
num = X * b;  
prb = exp(num .* Y) ./ (1 + exp(num));  
llik = -sum(log(prb));  
end

Where: (Y) is a column vector (say 1000 x 1)
(X) is a matrix of predictors (say 1000 x 5)
Key is exp(num .* Y) that will be used to obtain both proba(Y==1) and proba(Y==0)

Given the relative simplicity of this model and the efficiency of Matlab optimisation routines (fmincon/fminunc), I don't think you need to code the gradient, but can easily be done if really needed.

This model has a closed-form solution and then the results should in principle not depend on the choice of starting values. Hope this helps.

=== SCRIPT TO ESTIMATE THE MODEL ===
Here I assume that you first have created a script including the log-likelihood function only - Let's say this script is called "LOGISTIC_LL".

% IMPORT DATA
A = importdata('path.csv')

% DATA SPECIFICATION
Y = A.data(:,1);
X = A.data(:,2:end);

% VARIABLE NAMES
Vnames = {'x1','x2',etc.};

% STARTING VALUES
b = zeros(length(Vnames),1);

% OPTIMISATION
options = optimoptions(@fminunc,'Display','iter','MaxIterations',1e3,'MaxFunctionEvaluations',1e5);
tic;
[paramhat,fval,~,~,grad,hessian] = fminunc(@(b)LOGISTIC_LL(b, X, Y), b, options);

Finally, you could create another function to reshape results and compute other statistics (SEs, Pval, etc).

Solved – Logistic Regression is a Convex Problem but the results show otherwise

The problem with your data set is called Complete separation of the data. The likelihood associated to logistic regression models is concave, provided that there is no complete separation of the data.

The phenomenon of complete separation of the data is defined and discussed in:

Albert, Adelin, and J. A. Anderson. "On the existence of maximum likelihood estimates in logistic regression models." Biometrika 71.1 (1984): 1-10.

It has also been discussed in this site:

How to deal with perfect separation in logistic regression?

Best Answer

You need to do probability computations like this in log-space

Related Solutions

Solved – MATLAB implementation of MLE for Logistic Regression

Solved – Logistic Regression is a Convex Problem but the results show otherwise

Related Question