Solved – Reporting chi-square tests for weighted data

chi-squared-testdescriptive statisticssurveyweighted-sampling

I was wondering if you could share your experience in reporting chi-square tests for complex survey data in journal publications. Normally Chi-square tests are reported as $\chi^2_1(2,\text{ N} = 90)= 0.89, \text{ p} = .35$. (for example, although I guess there can be some variations). However the concept of N becomes rather tricky with complex survey design, and I'm not sure whether I should report the probability weighted sample size or just the sample size.I'm also interested in whether people would report different Ns if the sample was just subject to probability weighting (e.g. for example if strata or cluster information was unfortunately missing) or probability weighting with clustering and stratification. And if the sample size was reported in the format above should this be reported as n rather than N?

Just to give an example: Lumley's R package returns the following:

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svychisq(~sch.wide+stype,dclus1, statistic="Chisq") 


    Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~sch.wide + stype, dclus1, statistic = "Chisq") 
X-squared = 11.94, df = 2, p-value = 0.005553

Would you report as N

sum(xtabs(~sch.wide+stype, data=apiclus1)) #or
sum(svytable(~sch.wide+stype, dclus1))

See this page for details on computation and Rao-Scott corrections to the Pearson chisquared test.

Very interested in your opinion.
Many thanks

Best Answer

Personally, I always report the unweighted n, not the weighted values, and some restricted surveys ask you to round n to the nearest 50, but that depends on the survey. My 2 cents CS

Related Solutions

Solved – Is a chisquare on a (nearly) complete population data necessary

The answer is "it depends".

Some discussion in this related question and here and here. Basically, if you are interested only in describing this particular population, you could report just your proportions (possibly after imputing values for children you don't have) and be done with it. Some hard-liners insist there is no statistical inference to be made (other than the imputation) as you have all the data already.

If however you wish to answer a question that is not just about an actual finite population but the data generating process that produced the population, then it is often sensible to treat the "population" as though it is a sample from an infinite set generated by that process. Often these questions will be the ones of most theoretical or policy-relevant interest. This means you can do all the "usual" inference including Chi-square statistics in this case.

I personally am of the view that for many purposes is extremely useful to be able to know whether the observed relationship in the actual population was plausibly generated through random chance or not. For example, we may well be interested in semi-hypothetical populations - other states or times - that are important but too difficult to characterise exactly. Considering the hyper-population of the population you actually have can be a good starting point.

Solved – Transforming data for chi square — squaring negative value difference scores

You say you did paired t-tests on the original data, before dichotomizing it, and that males increased significantly from the old form to the new but the female change was not significant. Unfortunately, that can not be taken as showing that the male change was bigger than the female change. You need to do an independent-groups t-test on the two sets of change scores. (Better yet, you could replace all the t-tests by confidence intervals for the corresponding means and mean differences, which would give you more information.)

For the dichotomized data, the situation is similar.
You have two contingency table, one for males and one for females.

Males      
        Yes   No     Total
  Yes   Myy   Myn    My.
   No   Mny   Mnn    Mn.

Total   M.y   M.n    M.. = M = total number of Males

Females      
        Yes   No     Total
  Yes   Fyy   Fyn    Fy.
   No   Fny   Fnn    Fn.

Total   F.y   F.n    F.. = F = total number of Females

For each table, the analog of the paired t-test is the McNemar test,
http://en.wikipedia.org/wiki/McNemar%27s_test

I know of no simple standard test of the difference between the changes in endorsement rates, but if all of Myn, Mny, Myy+Mnn, Fyn, Fny, Fyy+Fnn are "large" then an asymptotic test might be justified.

Related Question