First question. You are right about being able to use software instead of tables
of the chi-squared distribution. For example, if df = 9 and the
chi-squared statistic is 20.16, you could look at a chi-squared
table to see that $20.16 > 19.02,$ where 19.02 cuts area 0.025
from the upper tail of $Chisq(df = 9)$.
You you would reject at
that 2.5% level.
If you wanted a P-value, you could use software
to find the probability of the chi-squared statistic being
greater than 20.16. In R software this is computed as follows,
where pchisq
stands for the CDF of a chi-squared distribution:
1 - pchisq(20.16, 9)
## 0.01695026
Thus the P-value (probability of a value more extreme than 20.16)
is about 0.017. Some software will give you the P-value automatically.
Second question. As far as binning is concerned, you are right that in some
instances there are alternate possible ways of binning. You do not
want so many bins that the expected counts in each bin get less than about 5, otherwise the approximation of the chi-squared statistic
to the chi-squared distribution is not good. Given that restriction,
it is usually better to use more bins rather than fewer.
Also notice
that the df of the chi-squared distribution depends directly on
the number of $bins$ used, not on the overall number of $events$ counted.
(I do not understand what you say about 'approximately Gaussian'
in this context.)
Examples: Here is an example in which we simulate 60 rolls of a fair die, so that we expect 10 instances of each face. The observed numbers
of each face are tabulated. Finally, a chi-squared test that the
die is fair has a chi-squared goodness-of-fit statistic of 3.0,
and a P-value of 70% (consistent with a fair die).
face = sample(1:6, 60, rep=T) # simulate 60 rolls of fair die
table(face)
## face
## 1 2 3 4 5 6
## 9 6 12 10 10 13
chisq.test(table(face))
## Chi-squared test for given probabilities # default is equal probabilities
## data: table(face)
## X-squared = 3, df = 5, p-value = 0.7
In the test, the default is that faces have equal probabilities
unless some other probability vector is specified. The test procedure
chisq.test
finds the P-value as follows (and rounds):
1 - pchisq(3, 5)
## 0.6999858
In our second example, we simulate 600 rolls of a die that
is heavily biased in favor of faces 4, 5, and 6 (see prob
vector). Here
the null hypothesis that the die is fair is soundly rejected
with an extremely small P-value.
face = sample(1:6, 600, repl=T, prob=c(1,1,1,2,2,2)/9 )
table(face)
## face
## 1 2 3 4 5 6
## 59 67 80 123 135 136
chisq.test(table(face))
## Chi-squared test for given probabilities # default is test for 'fair' die
## data: table(face)
## X-squared = 62.2, df = 5, p-value = 4.263e-12
Best Answer
Remember every hypothesis test is carried out under the assumption that the null model is correct; any chi-square distributions that may arise (or even exist) under the alternative model are not even considered. So it's not correct that we are assuming the independence of chi-square distributions built under different model assumptions.
What we do in the typical 'lack-of-fit'-type hypothesis test, where we compare the null model to a more complex alternative model, is we decompose some kind of sum of squares expression into a sum of two or more other sums of squares (example: residual sum of squares = pure-error sum of squares + lack-of-fit sum of squares); it's the guys on the RHS of this decomposition who are asserted to be independent and have chi-square distribution. The justification of this assertion is usually achieved through the magic of Cochran's Theorem.