Solved – Can chi-square test be used on non-integer observed frequencies

chi-squared-testcount-data

I am using a model to calculate observed frequencies, which some times gives non-integer values. I can round these frequencies but that seems like artificially distorting the information I have.
For Example:

Example Data
          Yes        No
Male       11        19
Female     16        17

Assume my model just divides everything by 3, so model data becomes:

           Yes        No
Male       3.67        6.33
Female     5.33        5.67

This data has to be used as "observed frequencies". Doing a chi-square test gives p value of 0.58. However, if I round this data to integers, chi-square test will give a p value of 0.8, which is very much different. My question is: is chi-square test theoretically valid on non-integer observed frequencies?

Edit: Please note that data and model specified in the question are not real, just to make you understand the problem I am facing. Real data is of this type.

            Male     Female
Source1     10.8      18.2
Source2     16        17

The real data is the prediction of males and females according to the job roles and City from Bureau of Labor Statistics.
I have no control over the data coming from source1, which (surprisingly) contains decimal point numbers. All I can do is round the data from source1.

Best Answer

observed count have decimal points.

If you have fractions, you don't have observed counts but something else. Counts actually count things, 0, 1, 2...

. The real data is the prediction of males and females according to the job roles and City from Bureau of Labor Statistics

Predictions don't have the same properties (including the same uncertainty) as count data.

The chi-squared test relies on the data being actual observed counts, not predictions of counts or any other manipulation of counts. This is needed to obtain the correct scaling of $O_i-E_i$ (the denominator is based on particular assumptions that in general won't hold for things that are not counts).

As a result, your test won't work - you can't just treat predictions as observed counts. It's irrelevant whether they were rounded integers or not (the only difference of any consequence is that the non-integer values made it obvious you didn't have actual observed counts; if the predictions had been rounded you might never have known there was a problem).

Related Question