Logistic Regression – Interpreting Zero-Inflated Ordered Logit Model

generalized linear modellogisticstata

Consider this Stata code and selected results:

use https://stats.idre.ucla.edu/stat/data/hsb2, clear

generate honcomp = (write >=60)
logit honcomp female read science

Iteration 0:   log likelihood = -115.64441
Iteration 1:   log likelihood = -84.558481
Iteration 2:   log likelihood = -80.491449
Iteration 3:   log likelihood = -80.123052
Iteration 4:   log likelihood = -80.118181
Iteration 5:   log likelihood = -80.11818

This is a listing of the log likelihoods at each iteration. (Remember that logistic regression uses maximum likelihood, which is an iterative procedure.) The first iteration (called iteration 0) is the log likelihood of the “null” or “empty” model; that is, a model with no predictors. At the next iteration, the predictor(s) are included in the model. At each iteration, the log likelihood increases because the goal is to maximize the log likelihood. When the difference between successive iterations is very small, the model is said to have “converged”, iteration is stopped and the results are displayed.

The above is from https://stats.idre.ucla.edu/stata/output/logistic-regression-analysis/.

There is also an option of fitting a zero-inflated ordered logit model.

. ziologit tobacco education income i.female, inflate(income education i.parent)

Iteration 0:   log likelihood = -15977.364  (not concave)
Iteration 1:   log likelihood =  -13149.83  (not concave)
Iteration 2:   log likelihood = -12467.245
Iteration 3:   log likelihood = -11039.218
Iteration 4:   log likelihood = -9929.2298
Iteration 5:   log likelihood = -9715.1143
Iteration 6:   log likelihood = -9703.2464
Iteration 7:   log likelihood = -9703.2168
Iteration 8:   log likelihood = -9703.2168

This output is from https://www.stata.com/new-in-stata/zero-inflated-ordered-logit/.

There appears to be one model computation being performed here, since there is only one iteration output.

Is there just one model being fitted here that provide simultaneously all the coefficients for the tobacco and inflate portions? If this is correct how is this done?
What relevance is “not concave”? Does it matter if it is or is not concave?
In the link, there is also /cut1, /cut2 and /cut3. What do these represent?

Best Answer

The zero-inflated original logit has a single likelihood that represents the likelihood of all the parameters being estimated. There are not two likelihoods for the two portions of the model. There is a single estimation procedure over all the parameters. This is why at each iteration you see only a single likelihood. This is done by writing down the likelihood for all the parameters in a single expression and maximizing this value. The two components of the model are estimated simultaneously and depend on each other. It's not the case that each unit is used twice; each unit contributes exactly once to the likelihood.

The /cut1, /cut2, and /cut3 in the output are the cutpoints in the ordinal logit model. See help ologit for more details. These are the equivalents of the intercept in a binary logistic regression. The probability that the linear predictor plus error falls between the cutpoints is the probability of receiving the corresponding value of the outcome (given that you are not in the inflation group).

Related Solutions

Generalized Linear Model – Using EM Algorithm to Calculate MLEs for a Zero Inflated Poisson Model

The root of the difficulty you are having lies in the sentence:

Then using the EM algorithm, we can maximize the second log-likelihood.

As you have observed, you can't. Instead, what you maximize is the expected value of the second log likelihood (known as the "complete data log likelihood"), where the expected value is taken over the $z_i$.

This leads to an iterative procedure, where at the $k^{th}$ iteration you calculate the expected values of the $z_i$ given the parameter estimates from the $(k-1)^{th}$ iteration (this is known as the "E-step",) then substitute them into the complete data log likelihood (see EDIT below for why we can do this in this case), and maximize that with respect to the parameters to get the estimates for the current iteration (the "M-step".)

The complete-data log likelihood for the zero-inflated Poisson in the simplest case - two parameters, say $\lambda$ and $p$ - allows for substantial simplification when it comes to the M-step, and this carries over to some extent to your form. I'll show you how that works in the simple case via some R code, so you can see the essence of it. I won't simplify as much as possible, since that might cause a loss of clarity when you think of your problem:

# Generate data
# Lambda = 1,  p(zero) = 0.1
x <- rpois(10000,1)
x[1:1000] <- 0

# Sufficient statistic for the ZIP
sum.x <- sum(x)

# (Poor) starting values for parameter estimates
phat <- 0.5
lhat <- 2.0

zhat <- rep(0,length(x))
for (i in 1:100) {
  # zhat[x>0] <- 0 always, so no need to make the assignment at every iteration
  zhat[x==0] <- phat/(phat +  (1-phat)*exp(-lhat))

  lhat <- sum.x/sum(1-zhat) # in effect, removing E(# zeroes due to z=1)
  phat <- mean(zhat)   

  cat("Iteration: ",i, "  lhat: ",lhat, "  phat: ", phat,"\n")
}

Iteration:  1   lhat:  1.443948   phat:  0.3792712 
Iteration:  2   lhat:  1.300164   phat:  0.3106252 
Iteration:  3   lhat:  1.225007   phat:  0.268331 
...
Iteration:  99   lhat:  0.9883329   phat:  0.09311933 
Iteration:  100   lhat:  0.9883194   phat:  0.09310694

In your case, at each step you'll do a weighted Poisson regression where the weights are 1-zhat to get the estimates of $\beta$ and therefore $\lambda_i$, and then maximize:

$\sum (\mathbb{E}z_i\log{p_i} + (1-\mathbb{E}z_i)\log{(1-p_i)})$

with respect to the coefficient vector of your matrix $\mathbf{G}$ to get the estimates of $p_i$. The expected values $\mathbb{E}z_i = p_i/(p_i+(1-p_i)\exp{(-\lambda_i)})$, again calculated at each iteration.

If you want to do this for real data, as opposed to just understanding the algorithm, R packages already exist; here's an example http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm using the pscl library.

EDIT: I should emphasize that what we are doing is maximizing the expected value of the complete-data log likelihood, NOT maximizing the complete-data log likelihood with the expected values of the missing data/latent variables plugged in. As it happens, if the complete-data log likelihood is linear in the missing data, as it is here, the two approaches are the same, but otherwise, they aren't.

Solved – Interpretation of binomial glm coefficients when response variable is not binary

This is one of the ways how can you provide data for logistic regression in R (see also here). In this case you are modeling probability of selling ticket given the predictors. You are still estimating probabilities. Moreover, your data is still (conditionally) binomial, you are predicting $k$ successes vs $n-k$ failures provided as $(k, n-k)$ tuples -- this is just another way of representing the same data.

Best Answer

Related Solutions

Generalized Linear Model – Using EM Algorithm to Calculate MLEs for a Zero Inflated Poisson Model

Solved – Interpretation of binomial glm coefficients when response variable is not binary

Related Question