This is not a solution but some reflections on the possibilities and difficulties that I know of.
Whenever it is possible to specify a time-series model as
$$Y_{t+1} = \mathbf{x}_t \beta + \epsilon_{t+1}$$
with $\mathbf{x}_t$ computable from covariates and time-lagged observations, it is also possible to compute the least-squares elastic net penalized estimator of $\beta$ using glmnet in R. It requires that you write code to compute $\mathbf{x}_t$ to form the model matrix that is to be specified in glmnet. This works for AR-models but not directly for ARMA-models, say. Moreover, the cross-validation procedures of glmnet are not sensible per se for time-series data.
For more general models
$$Y_{t+1} = f(\mathbf{x}_t, \beta) + \epsilon_{t+1}$$
an implementation of an algorithm for computing the non-linear least-squares elastic net penalized estimator of $\beta$ is needed. To the best of my knowledge there is no such implementation in R. I am currently writing an implementation to solve the case where
$$Y_{t+1} = \mathbf{x}_t g(\beta) + \epsilon_{t+1}$$
the point being that it is paramount for model selection that the lasso penalization is on $\beta$ and not $g(\beta)$. If I recall the ARIMA-parametrization correctly it also takes this form $-$ but I cannot offer any code at the moment. It is (will be) based on A coordinate gradient descent method for nonsmooth separable minimization.
Another issue is the selection of the amount of penalization (the tuning parameters). It will generally require a form of cross-validation for time-series, but I hope to be able to work out some less computationally demanding methods for specific models.
Suppose you have two highly correlated predictor variables $x,z$, and suppose both are centered and scaled (to mean zero, variance one). Then the ridge penalty on the parameter vector is $\beta_1^2 + \beta_2^2$ while the lasso penalty term is $ \mid \beta_1 \mid + \mid \beta_2 \mid$. Now, since the model is supposed highly colinear, so that $x$ and $z$ more or less can substitute each other in predicting $Y$, so many linear combination of $x, z$ where we simply substitute in part $x$ for $z$, will work very similarly as predictors, for example $0.2 x + 0.8 z, 0.3 x + 0.7 z$ or $0.5 x + 0.5 z$ will be about equally good as predictors. Now look at these three examples, the lasso penalty in all three cases are equal, it is 1, while the ridge penalty differ, it is respectively 0.68, 0.58, 0.5, so the ridge penalty will prefer equal weighting of colinear variables while lasso penalty will not be able to choose. This is one reason ridge (or more generally, elastic net, which is a linear combination of lasso and ridge penalties) will work better with colinear predictors: When the data give little reason to choose between different linear combinations of colinear predictors, lasso will just "wander" while ridge tends to choose equal weighting. That last might be a better guess for use with future data! And, if that is so with present data, could show up in cross validation as better results with ridge.
We can view this in a Bayesian way: Ridge and lasso implies different prior information, and the prior information implied by ridge tend to be more reasonable in such situations. (This explanation here I learned , more or less, from the book: "Statistical Learning with Sparsity The Lasso and Generalizations" by Trevor Hastie, Robert Tibshirani and Martin Wainwright, but at this moment I was not able to find a direct quote).
But the OP seems to have a different problem:
However, my results show that the mean absolute error of Lasso or
Elastic is around 0.61 whereas this score is 0.97 for the
ridge regression
Now, lasso is also effectively doing variable selection, it can set some coefficients exactly to zero. Ridge cannot do that (except with probability zero.) So it might be that with the OP data, among the colinear variables, some are effective and others don't act at all (and the degree of colinearity sufficiently low that this can be detected.) See When should I use lasso vs ridge? where this is discussed. A detailed analysis would need more information than is given in the question.
Best Answer
It's not immediately what exactly you're doing when fitting a model and what you goal is. I'll answer as best I can with the information provided.
GLMNET has two tuning parameters. A sequence of $\lambda$s is generated internally; the user supplies a value of $\alpha$.
The stated question is how to choose a GLMNET model that has 10-15 predictors. The number of nonzero predictors is tracked by the software. So for the supplied value of $\alpha$, just pick the solution corresponding to a $\lambda$ value that provides the desired number of predictors. On the assumption that the supplied value of $\alpha$ is "known," you're done. If you're uncertain about alpha (perhaps due to a desire to also account for collinearity), you'll have to tune over $\alpha$ and compare alternative models according to some appropriate out-of-sample metric in the usual way.
Also of interest may be my answer here. It's worth noting that this answer is highly controversial among several highly-ranked CV contributors, and I'm not certain about how to correctly approach the issue.