Solved – Wald test and Likelihood ratio test, where do the confidence intervals on the regression coefficients come from

confidence intervalhypothesis testinglikelihood-ratiologisticregression coefficients

So I'm trying to build my own Wald test and likelihood ratio test code within a machine learning pipeline. I can get the final fitted logistic regression coefficients from liblinear. I'm coding in MATLAB.

How would I get the variance of the regression coefficients and also the confidence intervals of the regression coefficients? Clearly for a variance and confidence interval, you need a sample of multiple sets of coefficients. But I thought you only get a single set of coefficients at the end of the log likelihood optimization.

Basically trying to replicate the results in the following link
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/nested_tests.htm

Best Answer

When you fit a logistic regression model, there is no closed form solution for the parameter estimates unlike in linear regression. So instead, you search over the parameter space for a set of parameter estimates that maximize the log likelihood or minimize the deviance. (I usually prefer to think in terms of the deviance, but in this case it might be better to think of maximizing the log likelihood.) The most common search procedure is the Newton-Raphson algorithm.

As you search, you could map out the shape of the log likelihood, but this isn't really done. In the process of running the Newton-Raphson algorithm, you calculate the Hessian matrix (and it shouldn't be difficult to get if you run a search algorithm that doesn't use the Hessian instead). That provides a picture of the shape in the region of the parameter space where you are currently. The Wald test of the parameter is based on the assumption that the log likelihood has the shape of a normal distribution (which it would with infinite data but may not with small samples). An estimate of the standard deviation is calculated, and that is used as the standard error. This is typically what is used to form confidence intervals for betas.

The likelihood ratio test works differently. The ratio of the likelihoods is the difference of the log likelihoods. That is, it is the difference between the likelihoods of the model when a parameter is set at two different values. Essentially always, these two values are the maximum likelihood estimate and the null value (0). The difference between these two values should be distributed as chi-squared. It is less common to use this approach to determine the confidence interval for a beta, but it can be done. You need to search over possible beta estimates and work backward to find values that constitute the limits of the interval.

There is a very useful figure (actually taken from John Fox) at the bottom of the linked page that is extremely helpful for understanding this topic.