There are several distinctions between linear and nonlinear regression models, but the primary mathematical one is that linear models are linear in the parameters, whereas nonlinear models are nonlinear in the parameters. Pinheiro and Bates (2000, pp. 284-285), authors of the nlme
R package, elegantly described the more substantive considerations in model selection:
When choosing a regression model to describe how a response variable varies with covariates, one always has the option of using models, such as polynomial models, that are linear in the parameters. By increasing the order of a polynomial model, one can get increasingly accurate approximations to the true, usually nonlinear, regression function, within the observed range of the data. These empirical models are based only on the observed relationship between the response and the covariates and do not include any theoretical considerations about the underlying mechanism producing the data.
Nonlinear models, on the other hand, are often mechanistic, i.e., based on a model for the mechanism producing the response. As a consequence, the model parameters in a nonlinear model generally have a natural physical interpretation. Even when derived empirically, nonlinear models usually incorporate known, theoretical characteristics of the data, such as asymptotes and monotonicity, and in these cases, can be considered as semi-mechanistic models. A nonlinear model generally uses fewer parameters than a competitor linear model, such as a polynomial, giving a more parsimonious description of the data. Nonlinear models also provide more reliable predictions for the response variable outside the observed range of the data than, say, polynomial models would.
There are also some big differences between the nlme and lme4 packages that go beyond the linearity issue. For example, using nlme you can fit linear or nonlinear models and, for either type, specify the variance and correlation structures for within-group errors (e.g., autoregressive); lme4 can't do that. In addition, random effects can be fixed or crossed in either package, but it's much easier (and more computationally efficient) to specify and model crossed random effects in lme4.
I would advise first considering a) whether you will need a nonlinear model, and b) whether you will need to specify either the within-group variance or correlation structures. If any of these answers is yes, then you have to use nlme (given that you're sticking with R). If you work a lot with linear models that have crossed random effects, or complicated combinations of nested and crossed random effects, then lme4 is probably a better choice. You may need to learn to use both packages. I learned lme4 first and then realized I had to use nlme because I almost always work with autoregressive error structures. However, I still prefer lme4 when I analyze data from experiments with crossed factors. The good news is that a great deal of what I learned about lme4 transferred well to nlme. Either way, Pinheiro and Bates (2000) is a great reference for mixed-effects models, and I'd say it's indispensable if you're using nlme.
References
Pinheiro, J.C., & Bates, D.M. (2000). Mixed-effects models in S and S-PLUS. New York: Springer-Verlag.
For VIF calculation usdm can also be package ( I need to install "usdm")
library(usdm)
df = # Data Frame
vif(df)
If VIF > 4.0 then I generally assume multicollinearity remove all those Predictor Variables before fitting them into my model
Best Answer
In theory shapley values can be applied to any model, but I can't imagine a good reason to use shapley values when you are already using a linear (mixed) model.
Shapley values are associated with the word explainability. The correct interpretation of the word explainability has nothing to do with describing the relationship of $x \rightarrow y$, but $d(x) \rightarrow y$, where $x$ is the covariates and $y$ is the outcome and $d(x)$ is a set of rules on $x$ that make $y$ predictions.
With a linear mixed model you have everything you could ever want. You have both $x \rightarrow y$ and $d(x) \rightarrow y$, but now $d(x)$ is a smooth function $f(x)$. Shapley values would give you coefficient estimates based off of $d(x)$ that have no lower variance than your linear mixed model $f(x)$. If you don't believe me then do a bootstrap sample of your data. Fit models to 1K replication and pull out your coefficients and compare them to your shapley values you will see (on average) that $var(coef_{shap}) \geq var(coef_{lm})$
The reason people use shapley values is because they do not care about $x \rightarrow y$. They just want to know why the model (set of rules) predicted what it did $d(x) \rightarrow y$. This is helpful with diagnosing a random forest predictions, for example.
Also, there are much better methods to pull out important coefficients relative to SHAP importance. See LASSO and its variations.