I am trying to compare two models using statsmodels.stats.anova_lm
. The output table I get is:
df_resid ssr df_diff ss_diff F Pr(>F)
0 72.0 113.319956 0.0 NaN NaN NaN
1 74.0 115.497953 -2.0 -2.177997 0.697726 NaN
I appreciate that there will always be NaN
s in the 0th row. But I don't understand the NaN
in the later row. Is it because it ran out of floats resolution?
Best Answer
This looks like it could be an error in how
statsmodels
produces p-values. Usually with an F-test, you need to supply the degrees of freedom for the test, and these degrees of freedom must be positive.statsmodels
should automatically take the absolute value of the degrees of freedom and sums of squares, but maybe it didn't. Try switching the order of the models, which should produce the same values but make the degrees of freedom positive.