I see that we have a concept called `expected value`

being used in machine learning (ML) models. For example, SHAP has a concept called Expected value. It means when all input features are 0, we can consider model to output the expected value (baseline value).

Similarly, in linear regression, we have a term called intercept. Intercept also means the same, which is expected model output when all input features are zero.

While, I understand that in real world we may not encounter this scenario of all input features being zero.

Why do we consider this base performance? Is it because, we assume that all models will have some inherent power to predict output (on random). So, we have that base value?

Like a student who doesn't prepare for exams, can still get some marks (>0). Is that the same understanding here?

## Best Answer

It is the simplest (or amongst the simplest) possible model(s). If your explanatory data (feature list) is $\mathcal D$ and the target variable is $Y$, what models usually try is predicting the conditional expectation, i.e. $\mathbb E[Y|\mathcal D]$. But, in the absence of data/features, you're left with unconditional expectation, i.e. $\mathbb E[Y]$. It is the simplest you can do without looking at the data/features.