Solved – glmnet: How to make sense of multinomial parameterization

categorical dataglmnetmultinomial-distribution

Following problem: I want to predict a categorical response variable with one (or more) categorical variables using glmnet().

However, I cannot make sense of the output glmnet gives me.

Ok, first let's generate two related categorical variables:

Generate Data

p <- 2 #number variables
mu <- rep(0,p)
sigma <- matrix(rep(0,p^2), ncol=p)
sigma[1,2] <- .8 #some relationship ..
diag(sigma) <- 1
sigma <- pmax(sigma, t(sigma))
n <- 100
set.seed(1)
library(MASS)
dat <- mvrnorm(n, mu, sigma)
#discretize
k <- 3 # number of categories
d <- apply(dat, 2, function(x) {
  q <- quantile(x, probs=seq(0,1, 1/k))[-c(1, k+1)]
  out <- numeric(length(x))
  for(i in 1:(k-1))
  {  out[x<q[k-i]] <- i } 
  return(out)
})
d <- data.frame(apply(d, 2, as.factor))
d[,2] <- relevel(d[,2], ref="0")
d[,1] <- relevel(d[,1], ref="0")
colnames(d) <- c("X1", "X2")

We get:

> table(d)
   X2
X1   0  1  2
  0 22 11  1
  1  9 14 10
  2  3  8 22

Prediction: multinom()

Then let's predict X1 by X2 using the multinom() from the nnet package:

library(nnet)
mod1 <- multinom(X1~X2, data=d)
mod1

which gives us:

Call:
multinom(formula = X1 ~ X2, data = d)

Coefficients:
  (Intercept)      X21      X22
1  -0.8938246 1.134993 3.196476
2  -1.9924124 1.673949 5.083518

Manual check

Now let's check, whether we can reproduce that manually:

tb <- table(d)
log(tb[2,1] / tb[1,1]) #intercept category1
[1] -0.8938179
log(tb[3,1] / tb[1,1]) #intercept category2
[1] -1.99243
log((tb[1,1]*tb[2,2]) / (tb[1,2]*tb[2,1])) #logodds-ratio cat X1 0vs1 in X2 0vs1
[1] 1.13498
#same for the three remaining log odds ratios

We produce the same numbers, good!

Prediction: glmnet()

Now let's do the same with glmnet:

library(glmnet)
y <- d[,1]
X <- model.matrix(X1~X2, data=d)[,-1]
mod2 <- glmnet(X, y, family="multinomial", lambda=c(0))
coef(mod2, s=0) #meaning of coefficients unclear!
$`0`
3 x 1 sparse Matrix of class "dgCMatrix"
                     1
(Intercept)  0.9620216
X21         -1.1349130
X22         -3.1958293   

$`1`
3 x 1 sparse Matrix of class "dgCMatrix"
                     1
(Intercept) 0.06825755
X21         .         
X22         .         

$`2`
3 x 1 sparse Matrix of class "dgCMatrix"
                     1
(Intercept) -1.0302792
X21          0.5388814
X22          1.8870363

Note that I set s=0, thus there is no regularisation and the parameters should contain the exact same information as the parameters of the multinom() function.

Still, we get very different parameters. This is due to the different parameterization they use in glmnet, see e.g.:

http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html (heading: Multinomial models)
or the corresponding paper:
http://www.jstatsoft.org/v33/i01/paper (heading: 4. Regularized multinomial regression)

But no matter how exactly one parameterizes, one should get the same $P(Y=k | X)$, the probability of category k conditional on X.

Conditional probabilities: multinom()

So I first I calculate these probabilities from multinom():

p.fit <- predict(mod1, type="probs")
head(d)
head(p.fit)
ccp <- matrix(0,3,3)
ccp[,3] <- p.fit[1,]
ccp[,2] <- p.fit[2,]
ccp[,1] <- p.fit[4,]
ccp
           [,1]      [,2]       [,3]
[1,] 0.64705896 0.3333332 0.03030114
[2,] 0.26470416 0.4242450 0.30303140
[3,] 0.08823688 0.2424218 0.66666746
colSums(ccp) #sum to 1, ok; sorry for the awful code ...
[1] 1 1 1

As we have a saturated model here, this should be the same as what we can calculate from the data:

emp <- table(d)/100
cemp <- apply(emp, 2, function(x) {
  x / sum(x)
})
cemp 
   X2
             0         1          2
  0 0.64705882 0.3333333 0.03030303
  1 0.26470588 0.4242424 0.30303030
  2 0.08823529 0.2424242 0.66666667

which is indeed the case.

Conditional probabilities: glmnet()

Now the same from glmnet:

c1 <- coef(mod2, s=0)
c <-matrix(rapply(c1, function(x) { as.matrix(x)}, how="unlist"), 3,3, byrow=T)

ccp2 <- matrix(0,3,3)
config <- rbind(c(0,0), c(1,0), c(0,1))

for(l in 1:3) #loop through categories
{
  denom <- numeric(3)
  for(i in 1:3) # loop through possible predictor combinations
  { 
    x1 <- config[i, 1]
    x2 <- config[i, 2]
    denom[i] <- exp(c[l,1] + x1 * c[l,2]  + x2 * c[l,3])
  }
  ccp2[l,1] <- denom[1] / sum(denom)
  ccp2[l,2] <- denom[2] / sum(denom)
  ccp2[l,3] <- denom[3] / sum(denom)
}
ccp2
          [,1]      [,2]       [,3]
[1,] 0.7340082 0.2359470 0.03004484
[2,] 0.3333333 0.3333333 0.33333333
[3,] 0.1073668 0.1840361 0.70859708
colSums(ccp2)
[1] 1.1747083 0.7533165 1.0719753

The cell conditional probabilities are somewhat related but different. Also they do not sum up to one.

So we have two problems here:

a) the conditional probabilities do not sum up to 1 and

b) the parameters do not describe what we see in the data: e.g. in row 2 there are differences across columns, but glmnet estimates both coefficients (not the intercept) as zero.

I used a linear regression problem and compared glm and glmnet with s=0 to make sure that s=0 means zero regularisation (the solutions were almost identical).

Any help & ideas would be highly appreciated!

Best Answer

About the parameters from multinom and glmnet, I found this answer beneficial, Can I use glm algorithms to do a multinomial logistic regression?

especially, "Yes, with a Poisson GLM (log linear model) you can fit multinomial models. Hence multinomial logistic or log linear Poisson models are equivalent."

So I'll show the reparametrization of the glmnet coefficients to multinom coefficients.

n.subj=1000
x1 <- rnorm(n.subj)
x2 <- rnorm(n.subj)
prob <- matrix(c(rep(1,n.subj), exp(3+2*x1+x2), exp(-1+x1-3*x2)), , ncol=3)
prob <- sweep(prob, 1, apply(prob, 1, sum), "/")

y = c()
for (i in 1:n.subj)
  y[i] <- sample(3, 1, replace = T, prob = prob[i,])

multinom(y~x1+x2)

x <- cbind(x1,x2); y2 <- factor(y)
fit <- glmnet(x, y2, family="multinomial", lambda=0, type.multinomial =     "grouped")
cf <- coef(fit)

cf[[2]]@x - cf[[1]]@x   # for the category 2
cf[[3]]@x - cf[[1]]@x   # for the category 3

Hope this helps. But I don't think I understand the equivalence of Generalized Linear Model(Poisson) and multinomial logistic model in and out.

Tell me if there's good and readable and "easily" understandable source..