Solved – Chi-squared test for histogram data after doing an averaged shifted histogram

chi-squared-testdistributionsgoodness of fithistogramqq-plot

I have a data set of 903 continuous observations, that I graphically visualize with a histogram. The bin and width values could be optimized, but it is logical from the distribution that I have a Gaussian function.

When I do the fitting, I use the frequency values of the data as the Y-values. For example, if the observations are ${(2,3,3,3,4,4,5)}$, and the user-defined bin width is 1.0, then the corresponding y-values would be $(1,3,3,3,2,2,1)$, assuming that the first bin limit will be assigned as the minimum value.

I am not obtaining statistical significance with a Gaussian fitting, in terms of goodness of fit ($Q$) with the Chi-squared test ($\chi^2$). In other words, my null hypothesis is rejected making that a Gaussian model does not represent the experimental data.

Now, I am doing the same test but using an averaged shifted histogram version of my x-values and the frequency for those averaged bins as the y-values for each observation, I now obtain good results in terms of goodness of fit.

I need to clarify if it is valid to realize a Goodness of fit for an averaged shifted histogram, or if there is clear bias for data overfitting.

Here it is a q-q plot of the data:

QQ Plot

Best Answer

The biggest problem is that an averaged shifted histogram has positive dependence in adjacent bins, so a test derived on an independence assumption (aside the negative dependence induced by the total count being conditioned on, which is adjusted for) won't have the right distribution for its test statistic.

It's possible to adapt a test for such dependence, but the vanilla version of the test will be wrong.

[If you want to test for normality, doing it from a histogram isn't a particularly good way to do it. A Shapiro-Wilk or Shapiro-Francia test, an Anderson-Darling test, or perhaps a Smooth test of the kind discussed in Rayner and Best's book Smooth Tests of Goodness of Fit would be better. The nice thing about a Shapiro-Francia test is it's just based on the correlation in a normal scores plot (Q-Q plot for normality), which gives a visual assessment of the non-normality]

--

Edit - looking at your QQ plot - the data are very far from normal. No reasonable test would fail to reject normality at that sample size. A Lilliefors test or an Anderson-Darling or a Shapiro-Wlik or a smooth test with a standard number of terms ($k=4$ or $k=6$) will all reject easily... you don't even need to test that.