Solved – Chi squared test for goodness of fit

chi-squared-testcurve fittinggoodness of fithypothesis testingscipy

I have a signal coming from a measurement device (e.g., Volt vs., time) and perform a fit to the signal. I would like to do a goodness of fit test like the chi-squared test. However, I am not sure whether it is correct to apply it in my case.

In most online-tutorials (e.g. here http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm), the first step of the chi-squared test is to construct the relative residuals as follows: $\sum (O_i – E_i)^2/E_i$ where $O_i$ would be the observed value and $E_i$ the expected value. This is most often done with data which are merely counts, e.g., counts in bins.

However, on the other hand I know that often in science the relative residuals are done in the following way: $\sum (O_i – E_i)^2/\sigma_i^2$ where $\sigma_i$ denotes the error of data point $i$.

I have two questions:

  1. Is the first version only true when the noise on the data is Poisson, and thus $\sigma_i^2 = E_i$.
  2. Since the examples I found are always for counts in some categories, is the chi-squared test even the right one to measure the goodness of a fit for, e.g., a voltage vs. time signal (or more mathematically: I don't draw multiple random variables according to some distribution and categorize them into bins)?

Edit:
Some more information about the fit and my data. For every timestep dt I have exactly one voltage value and I fit (using scipy curve_fit) a hyperbolic curve of the form
$f(x) = \frac{mx}{k+x}$ to it. I know the error of the data points. Here is a plot of the points used for the fit (red) and the fitted curve (blue). The way I would think to apply the test is calculating $\frac{1}{n_{\textrm{points}}-2}\sum (y_i – f(y_i))^2/\sigma_i^2$ (in the first denominator -2 as we have two parameters in the fitting function). And to my knowledge this should be around 1. As for the precise test, I have to say that I don't know which chi square distribution to use (the number of degrees of freedom) to make the hypothesis test.
enter image description here

Best Answer

Is the first version only true when the noise on the data is Poisson, and thus $σ^2_i=E_i$.

Not quite; for example it works for the multinomial (see Pearson 1900 ); $E_i$ is no longer the standard deviation but the dependence between cells in the multinomial exactly compensates for it; see also the test of independence.

Since the examples I found are always for counts in some categories, is the chi-squared test even the right one to measure the goodness of a fit for, e.g., a voltage vs. time signal

Under some very particular assumptions perhaps, including conditionally independent Gaussian response and known $\sigma_i$. I often see it applied where it clearly doesn't apply (e.g. where there's substantial observation error in the $x$'s and its a situation where an errors-in-variables model would apply, or where the supposedly-known $\sigma$ values are clearly inconsistent with the spread of the data around the fit).

To my recollection it doesn't actually apply to nonlinear models when estimating parameters (except approximately/asymptotically).

Related Question