I am using a model to calculate observed frequencies, which some times gives non-integer values. I can round these frequencies but that seems like artificially distorting the information I have.
For Example:
Example Data
Yes No
Male 11 19
Female 16 17
Assume my model just divides everything by 3, so model data becomes:
Yes No
Male 3.67 6.33
Female 5.33 5.67
This data has to be used as "observed frequencies". Doing a chi-square test gives p value of 0.58. However, if I round this data to integers, chi-square test will give a p value of 0.8, which is very much different. My question is: is chi-square test theoretically valid on non-integer observed frequencies?
Edit: Please note that data and model specified in the question are not real, just to make you understand the problem I am facing. Real data is of this type.
Male Female
Source1 10.8 18.2
Source2 16 17
The real data is the prediction of males and females according to the job roles and City from Bureau of Labor Statistics.
I have no control over the data coming from source1, which (surprisingly) contains decimal point numbers. All I can do is round the data from source1.
Best Answer
If you have fractions, you don't have observed counts but something else. Counts actually count things, 0, 1, 2...
Predictions don't have the same properties (including the same uncertainty) as count data.
The chi-squared test relies on the data being actual observed counts, not predictions of counts or any other manipulation of counts. This is needed to obtain the correct scaling of $O_i-E_i$ (the denominator is based on particular assumptions that in general won't hold for things that are not counts).
As a result, your test won't work - you can't just treat predictions as observed counts. It's irrelevant whether they were rounded integers or not (the only difference of any consequence is that the non-integer values made it obvious you didn't have actual observed counts; if the predictions had been rounded you might never have known there was a problem).