Intuitive Explanation of Friedman’s H-Statistic – How to Understand

distributionsinteractionintuition

What is the cleanest, easiest way to explain someone, a non-STEM person the concept of Friedman's H-statistic? What does it intuitively mean?

While exploring feature interaction I went through Friedman's H-statistic.

Mathematically, the H-statistic proposed by Friedman and Popescu for the interaction between feature $j$ and $k$ is:

$H^2_{jk}=\sum_{i=1}^n\left[PD_{jk}(x_{j}^{(i)},x_k^{(i)})-PD_j(x_j^{(i)})-PD_k(x_{k}^{(i)})\right]^2/\sum_{i=1}^n{PD}^2_{jk}(x_j^{(i)},x_k^{(i)})$

The partial dependence function for regression is defined as:

$\hat{f}_{x_S}(x_S)=E_{x_C}\left[\hat{f}(x_S,x_C)\right]=\int\hat{f}(x_S,x_C)d\mathbb{P}(x_C)$

It's a concept that I have difficulty in articulating.

Can someone please explain it using simple examples?

Best Answer

Since you want to address non-STEM people, I assume that you want to convey the meaning of $H$ in an intuitive manner. In this case, the sum, the square inside, and the denominator are not the most important things here.

Let's assume that your model predicts one response variable and has two independent variables or features. Simply put, the statistic $H$ measures whether one of the independent variables has a different effect on the response variable, depending on the value of the other independent variable.

But if you really want to address non-STEM people, nothing better than a real-world example with which everyone is familiar. How does the sweetness of a coffee depend on sugar and stirring? Is sweetness simply the sum of a contribution due to sugar plus a contribution due to stirring? (this would have low $H$). Obviously not, since adding sugar plus stirring the coffee increases the sweetness more than having sugar without stirring or having stirring without sugar (thus, this case would have higher $H$).