Solved – R glmnet and elasticnet gives different results, why

elastic netmachine learningr

My question is simple: when glmnet use alpha between 0 and 1 (i.e. elastic net), is it returning naive elastic net or adjusted one? Especially, for every method inbuilt (coef, cv.glmnet, predict) is it naive or adjusted?

The reason I am wondering about this is because I cannot align glmnet with enet.

require(ISLR)
library(glmnet)
library(elasticnet)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
enet.model=enet(x,y,lambda=1)
predict(enet.model,type="coef",s=2,mode="penalty")
glmnet.model=glmnet(x,y,alpha=0.5)
coef(glmnet.model,s=4,exact=TRUE)

They give different results, why?

Best Answer

x=model.matrix(Apps~.,College)[,-1]
y=College$Apps

enet.model=enet(x, y, lambda=1)
mx_x <- data.frame(coef_enet = round(predict(enet.model, type="coef", s=2, mode="penalty", naive = F)$coefficients, 2))
mx_x$coef_enet.naive <- round(predict(enet.model, type="coef", s=2, mode="penalty", naive = T)$coefficients, 2)

glmnet.model=glmnet(x, y, alpha=0.50)
mx_x$coef_glmnet <- round(coef(glmnet.model, s=2, exact=TRUE), 2)[-1]

mx_x

The script above gives:

               coef_enet coef_enet.naive coef_glmnet

PrivateYes   -1125.13         -562.56     -493.72
Accept           0.85            0.43        1.57
Enroll           1.44            0.72       -0.80
Top10perc       26.13           13.06       48.90
Top25perc       17.20            8.60      -13.47
F.Undergrad      0.25            0.12        0.05
P.Undergrad      0.21            0.10        0.04
Outstate         0.02            0.01       -0.08
Room.Board       0.31            0.15        0.15
Books            0.70            0.35        0.02
Personal         0.12            0.06        0.03
PhD             11.77            5.88       -8.53
Terminal        10.30            5.15       -3.31
S.F.Ratio       23.87           11.93       14.97
perc.alumni    -17.08           -8.54       -0.01
Expend           0.09            0.05        0.08
Grad.Rate       19.05            9.52        8.51

Using naive=FALSE for ElasticNet is just transforming the coefficients of naive=TRUE according to formula: coef(ENet) = (1 + lambda) * coef(NaiveENet)

The glmnet gives "ready-to-use" coefficients. However they are different from those of ElasticNet. (maybe different algorithms)

You'll get ElasticNet and glmnet "aligned" if you don't normalize the predictors.

enet.model=enet(x, y, lambda=1, normalize = F)
mx_x <- data.frame(coef_enet = round(predict(enet.model, type="coef", s=2,     mode="penalty", naive = F)$coefficients, 2))
mx_x$coef_enet.naive <- round(predict(enet.model, type="coef", s=2, mode="penalty", naive = T)$coefficients, 2)

glmnet.model=glmnet(x, y, alpha=0.50, standardize = F)
mx_x$coef_glmnet <- round(coef(glmnet.model, s=2, exact=TRUE), 2)[-1]

mx_x



        coef_enet coef_enet.naive coef_glmnet
PrivateYes    -971.26         -485.63     -481.73
Accept           3.17            1.59        1.58
Enroll          -1.76           -0.88       -0.87
Top10perc       99.84           49.92       49.57
Top25perc      -28.47          -14.24      -14.03
F.Undergrad      0.12            0.06        0.06
P.Undergrad      0.09            0.04        0.04
Outstate        -0.17           -0.09       -0.09
Room.Board       0.30            0.15        0.15
Books            0.04            0.02        0.02
Personal         0.06            0.03        0.03
PhD            -17.29           -8.65       -8.54
Terminal        -6.59           -3.30       -3.35
S.F.Ratio       31.04           15.52       15.48
perc.alumni      0.31            0.16        0.09
Expend           0.16            0.08        0.08
Grad.Rate       17.31            8.66        8.63

How predict works?

mx_x2 <- data.frame(pred_enet = predict(enet.model, newx=x, s=2, mode="penalty", naive=F)$fit,
pred_enet.naive = predict(enet.model, newx=x, s=2, mode="penalty", naive=T)$fit,
pred_glmnet = as.vector(predict(glmnet.model, newx=x, s=2, exact=T)))
head(mx_x2)

                          pred_enet pred_enet.naive pred_glmnet
Abilene Christian University  -200.7408       1400.4488   1403.1934
Adelphi University            3757.6684       3379.6534   3376.7797
Adrian College                -456.4379       1272.6002   1272.1445
Agnes Scott College            994.5582       1998.0983   1997.6611
Alaska Pacific University    -3453.9529       -226.1573   -221.7383
Albertson College            -1664.2285        668.7049    668.3047

enet prediction is linear in enet naive prediction (formula):

fit.lm <- lm(pred_enet~pred_enet.naive, data=mx_x2)
coef(fit.lm)
(Intercept) pred_enet.naive 
  -3001.638           2.000 
summary(fit.lm)$r.squared
[1] 1

enet naive is almost identical to glmnet prediction:

fit.lm <- lm(pred_enet.naive~-1+pred_glmnet, data=mx_x2)
coef(fit.lm)
pred_glmnet 
1.000126
summary(fit.lm)$r.squared
[1] 0.9999994

Related Solutions

Elastic Net Regression – Why Do Regression and Elastic Net Provide Different Results?

There is no free lunch in statistics. Elastic Net reduces overfitting (lowers variance) at the cost of increasing bias. With OLS, you could fit a model with all 50 variables. This OLS model would have very low bias (under certain assumptions, the coefficient estimates may be unbiased) but suffer from high variance (overfitting).

In your case, you mentioned that the OLS coefficients look very different than the Elastic Net coefficients, even though both models use the same 10 variables. The difference may be due to bias introduced by the fact that Elastic Net does not compute the coefficients by minimizing the residual sum of squares (which is how OLS computes the coefficients). Elastic net computes the coefficients by minimizing the "penalized" residual sum of squares.

Alternatively, the coefficient estimates may be different between OLS and Elastic Net due to sample size. With small sizes, p-values from OLS may not be reliable. With small sample sizes, the bias from elastic net may also be high.

Here's a simulated example using $n=25$. The "true model" contains only two variables, $x1$ and $x2$, with the "true coefficients" of 2 and 3. Due to small sample size and high irreducible error, the p-value for x1 is high (>22%). The coefficient differences between the two models are also high.

set.seed(1983)

nobs <- 25

x1 <- rnorm(nobs, 10, 10)
x2 <- rnorm(nobs, 20, 20)
x3 <- rnorm(nobs, 30, 30)
x4 <- rnorm(nobs, 40, 40)

y <- 100 + 2*x1 + 3*x2 + rnorm(nobs,0,100)

df <- data.frame(y=y, x1=x1, x2=x2, x3=x3, x4=x4)

### fit a linear model

lm.mod <- lm(y ~ ., data=df)

summary(lm.mod)

### fit an elastic net model using 5-fold CV

library(caret)

set.seed(1984)

enet.mod <- train(y ~ ., data=df, method="glmnet", tuneLength=5, trControl=trainControl(method="cv", number=5))

coef(enet.mod$finalModel, enet.mod$bestTune$lambda)

### compute diffs between coefs

lm.mod$coefficients - t(coef(enet.mod$finalModel, enet.mod$bestTune$lambda))[1,]

When the sample size is increased to $n = 1000$, the p-value for $x1$ is very low and the coefficient differences between the two models are small.

set.seed(1983)

nobs <- 1000

x1 <- rnorm(nobs, 10, 10)
x2 <- rnorm(nobs, 20, 20)
x3 <- rnorm(nobs, 30, 30)
x4 <- rnorm(nobs, 40, 40)

y <- 100 + 2*x1 + 3*x2 + rnorm(nobs,0,100)

df <- data.frame(y=y, x1=x1, x2=x2, x3=x3, x4=x4)

### fit a linear model

lm.mod <- lm(y ~ ., data=df)

summary(lm.mod)

### fit an elastic net model using 5-fold CV

library(caret)

set.seed(1984)

enet.mod <- train(y ~ ., data=df, method="glmnet", tuneLength=5, trControl=trainControl(method="cv", number=5))

coef(enet.mod$finalModel, enet.mod$bestTune$lambda)

### compute diffs between coefs

lm.mod$coefficients - t(coef(enet.mod$finalModel, enet.mod$bestTune$lambda))[1,]

Solved – Difference between ElasticNet in scikit-learn Python and Glmnet in R

Finally I got the same values with the following code :

Python

# normalize function that gives the same with R
def mystandardize(D):
   S = np.std(D, axis=0, ddof=1)
   M = np.mean(D, axis = 0)
   D_norm = (D-M)/S
return [D_norm, M, S]

Y_norm_train = pd.DataFrame(mystandardize(Y_train)[0])
glmnet_regr = linear_model.ElasticNet(alpha=1, l1_ratio = 0.01,
                                  fit_intercept = True, normalize =    False, tol=0.0000001, max_iter = 100000)
glmnet_regr.fit(X_train, Y_norm_train)

y_norm_train <- scale(y[train_idx])
glmnet_obj_norm <- glmnet(x_train, y_norm_train, alpha=0.01, lambda = 1,  
                   thresh = 1e-07, standardize = FALSE, intercept=TRUE, standardize.response = FALSE)
print_coef(glmnet_obj_norm)

Best Answer

Related Solutions

Elastic Net Regression – Why Do Regression and Elastic Net Provide Different Results?

Solved – Difference between ElasticNet in scikit-learn Python and Glmnet in R

Related Question