R Logistic Regression – How to Code the Likelihood Function for Logistic Regression

computational-statisticslikelihoodlogisticr

I would appreciate help in understanding if I made a correct interpretation and coding of the likelihood function for logistic regression.

Background: For a task I am going to write a function in R of the likelihood function for a logistic regression model. It has been given that $p(x_i)=\frac{1}{1+\exp(-x_i\beta)}$ where $\beta=(\beta_1,…,\beta_k)$ and $x_i=(x_{i,1},…,x_{i,k})$ and that the design matrix $\mathbf{x}$ is an $N \times k$ matrix.

My understanding: I did some research and found that the likelihood function for a logistic regression is given by $L(\beta)=\prod p_i(\beta)^{y_i}(1-p_i(\beta))^{1-y_i}$ so I wrote the following code for this formula:

L <- function(beta,y,X){
  p <- numeric()
  for(i in 1:nrow(X)) {             
    p <- c(p, 1/(1+exp(as.vector(-X[i,])%*%beta)))
  }
  return(prod((p^y)%*%(1-p)^(1-y)))
  }

I first retrieved all p the values ​​from the rows in the matrix X with the help of a for loop, and saved these in p. This allowed me to then insert all the vectors into the likelihood function given above. Observe that $\mathbf{y}=(y_1,…,y_N)^T$ and that $\mathbf{p}=(p_1,…,p_N)^T.$

Question: I have no way to check if my function is correct, and have based my code for likelihood in logistic regression from the course literature, so I would really appreciate help with whether my interpretation is correct. Is it true that a logistic regression has the given likelihood function? And is my code a correct way to implement this?

Best Answer

You need to do probability computations like this in log-space

Your computation is using a naïve method that is not how we code likelihood functions (or other probability functions) in practice. When undertaking computing with likelihood functions, it is much better to work in log-space to avoid arithmetic underflow problems and problems with numerical instability. This requires you to deal exclusively with log-probabilities in all your intermediate steps and then transform back to get the likelihood function only at the end of your computation.

To do this, you need to compute the log-likelihood function using log-probabilities in all the intermediate calculations. The log-likelihood function for the logistic regression is:

$$\ell_\mathbf{\mathbf{x},\mathbf{y}}(\boldsymbol{\beta}) = \sum_{i=1}^n [y_i \log(p_i(\boldsymbol{\beta})) + (1-y_i) \log(1-p_i(\boldsymbol{\beta}))],$$

where we can write the log-probabilities in this expression (using the hyperbolic function log1pexp) as:

$$\begin{align} \log(p_i(\boldsymbol{\beta})) &= \log \Bigg( \frac{1}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})} \Bigg) \\[6pt] &= -\log \Bigg( 1 + \exp(-\mathbf{x}_i \boldsymbol{\beta}) \Bigg) \\[12pt] &= -\text{log1pexp} (- \mathbf{x}_i \boldsymbol{\beta}), \\[12pt] \log(1-p_i(\boldsymbol{\beta})) &= \log \Bigg( 1 - \frac{1}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})} \Bigg) \\[6pt] &= \log \Bigg( \frac{\exp(-\mathbf{x}_i \boldsymbol{\beta})}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})} \Bigg) \\[6pt] &= -\mathbf{x}_i \boldsymbol{\beta} - \log \Bigg( 1 + \exp(-\mathbf{x}_i \boldsymbol{\beta}) \Bigg) \\[12pt] &= -\mathbf{x}_i \boldsymbol{\beta} -\text{log1pexp} (- \mathbf{x}_i \boldsymbol{\beta}). \\[6pt] \end{align}$$

This allows us to write the entire model in log-space as:

$$\ell_\mathbf{\mathbf{x},\mathbf{y}}(\boldsymbol{\beta}) = - \sum_{i=1}^n [\text{log1pexp} (- \mathbf{x}_i \boldsymbol{\beta}) + (1-y_i) \mathbf{x}_i \boldsymbol{\beta}].$$

Below I have knocked up an R function that computes the likelihood function for logistic regression using log-space computation. This method of programming the function will be far superior to what you have in your question, since it will avoid the arithmetic underflow that occurs when you multiply lots of small probabilities. In this function I have included a logical option loglike to allow the user to specify whether they want to return the likelihood function or log-likelihood function; observe that even if the user asks for the likelihood function, all intermediate computation is done in log-space and we only transform back to the likelihood function in the last step.

likelihood.logistic.regression <- function(beta, y, x, loglike = FALSE) {
  
  #Get parameters
  n <- length(y)
  
  #Compute the log-likelihood
  TERMS <- rep(-Inf, n)
  for (i in 1:n) {
    XB <- sum(x[i, ]*beta)
    TERMS[i] <- - VGAM::log1pexp(-XB) - (1-y[i])*XB }
  LOGLIKE <- sum(TERMS)
  
  #Return the output
  if (loglike) { LOGLIKE } else { exp(LOGLIKE) } }

As pointed out in the comments, one way to sanity-check the code is to simulate data from a logistic regression and then use the log-likelihood function in an optimiser to compute the MLE of the parameter vector. If your function is written correctly (and if all other functions you are using in your analysis are written correctly) then the MLE you compute from a large simulation ought to be reasonably close to the true parameter values used to generate the simulation. I will leave this as an exercise for you.