Please think very carefully about why you want confidence intervals for the LASSO coefficients and how you will interpret them. This is not an easy problem.

The predictors chosen by LASSO (as for any feature-selection method) can be highly dependent on the data sample at hand. You can examine this in your own data by repeating your LASSO model-building procedure on multiple bootstrap samples of the data. If you have predictors that are correlated with each other, the specific predictors chosen by LASSO are likely to differ among models based on the different bootstrap samples. So what do you mean by a confidence interval for a coefficient for a predictor, say predictor $x_1$, if $x_1$ wouldn't even have been chosen by LASSO if you had worked with a different sample from the same population?

The quality of predictions from a LASSO model is typically of more interest than are confidence intervals for the individual coefficients. Despite the instability in feature selection, LASSO-based models can be useful for prediction. The selection of 1 from among several correlated predictors might be somewhat arbitrary, but the 1 selected serves as a rough proxy for the others and thus can lead to valid predictions. You can test the performance of your LASSO approach by seeing how well the models based on multiple bootstrapped samples work on the full original data set.

That said, there is recent work on principled ways to obtain confidence intervals and on related issues in inference after LASSO. This page and its links is a good place to start. The issues are discussed in more detail in Section 6.3 of Statistical Learning with Sparsity. There is also a package selectiveInference in R that implements these methods. But these are based on specific assumptions that might not hold in your data. If you do choose to use this approach, make sure to understand the conditions under which the approach is valid and exactly what those confidence intervals really mean. That statistical issue, rather than the R coding issue, is what is crucial here.

The problem is that you cannot use the confidence intervals for the coefficients in that way, for various reasons, including that it ignores dependence among the estimates. The fact that the lines cross, indicating a 95% confidence interval on a single value, is a clue to the mistake.

Instead, (i) find the logits and their standard errors (this involves finding the asymptotic variance of a linear combination of the coefficient estimates), (ii) find the 95% intervals for the true logits, and (iii) back-transform to get to the the probability scale, like this:

```
pred1 <- predict.glm(m1, newdata = new, type = "link", se.fit=TRUE)
logit = pred1$fit
fit.prob = exp(logit)/(1+exp(logit))
upper.logit = logit + 1.96*pred1$se.fit
lower.logit = logit - 1.96*pred1$se.fit
upper.prob = exp(upper.logit)/(1+exp(upper.logit))
lower.prob = exp(lower.logit)/(1+exp(lower.logit))
lines(new$x, lower.prob, col = "red")
lines(new$x, upper.prob , col = "green")
```

Now the picture makes more sense:

## Best Answer

The penalization of coefficients with methods like lasso, adaptive lasso, and ridge regression means that you can model data even when the number of predictors exceeds the number of observations. You certainly have enough to use adaptive lasso, although this doesn't mean that the results will necessarily be as good as you might find with a larger data set.

If you had 100 times as many cases you might consider train/test splits. That only leads to trouble with data sets of this scale. You can validate your model-building process by repeating it on multiple bootstrap samples of your data and evaluating those models on the full data set.

Categorical predictors have to be handled carefully in penalized regressions, although there might be some simplification with adaptive lasso.

First, with standard lasso and ridge regression you want all predictors to be on comparable scales because you penalize all regression coefficients equally according to their magnitudes (lasso) or squared magnitudes (ridge). For continuous predictors that's accomplished via scaling to unit variance. But there's no single simple way to put categorical predictors into comparable scales versus each other or versus continuous predictors. The extra weighting of coefficient magnitudes in adaptive lasso inversely to initial estimate magnitudes might tend to minimize that problem.

Second, simple lasso by itself doesn't know that multiple regression coefficients correspond to the same multi-category predictor. You can specify that with the group lasso. So if you want all coefficients associated with a multi-category predictor to be retained or excluded together you would need to use an adaptive group lasso.