SHAP Values – Can SHAP Be Used for Linear Mixed Models?

importancemixed modelrshapley-value

Can SHAP importance be used for linear mixed models? I've seen it used for a variety of different modeling methods and was curious if it was possible to use it for linear mixed models? I am using the lme4 package in R.

Any help at all is greatly appreciated!

Best Answer

In theory shapley values can be applied to any model, but I can't imagine a good reason to use shapley values when you are already using a linear (mixed) model.

Shapley values are associated with the word explainability. The correct interpretation of the word explainability has nothing to do with describing the relationship of $x \rightarrow y$, but $d(x) \rightarrow y$, where $x$ is the covariates and $y$ is the outcome and $d(x)$ is a set of rules on $x$ that make $y$ predictions.

With a linear mixed model you have everything you could ever want. You have both $x \rightarrow y$ and $d(x) \rightarrow y$, but now $d(x)$ is a smooth function $f(x)$. Shapley values would give you coefficient estimates based off of $d(x)$ that have no lower variance than your linear mixed model $f(x)$. If you don't believe me then do a bootstrap sample of your data. Fit models to 1K replication and pull out your coefficients and compare them to your shapley values you will see (on average) that $var(coef_{shap}) \geq var(coef_{lm})$

The reason people use shapley values is because they do not care about $x \rightarrow y$. They just want to know why the model (set of rules) predicted what it did $d(x) \rightarrow y$. This is helpful with diagnosing a random forest predictions, for example.

Also, there are much better methods to pull out important coefficients relative to SHAP importance. See LASSO and its variations.