Solved – Hypothesis testing: difference between proportions

categorical datahypothesis testingproportion;t-test

I am investigating staff inequality between genders and ethnicities in an institution. I have data on contract types (permanent or fixed term) and pay grades for almost all employees. I want to test the hypothesis that there is no inequality across genders and ethnicities in contract types or pay grades.

For example, I have that 592/1972 white people are on a fixed term contract, while 57/114 black or minority ethnicity are on a fixed term contract. Which statistical test to use to test the hypothesis that there is no significant difference?

I was thinking of using the t-test, however the dependent variable is not continuous (it is proportions). I then thought about the two-proportion z test but each of my populations are not at least 10 times as big as its sample (at least for the gender case). I don't think categorical data analysis would help either because I still need to incorporate the fact that I have different sizes of populations? Which test then best matches the assumptions?

Moreover, since I am interested in generalising only to the institution itself, I can say I have the whole population in my data (non-responses were minor), how does that help/not help the analysis?

Best Answer

If you truly have the whole population of interest, there's no need for a hypothesis test at all. The point of hypothesis tests are to make inferences about populations. If you have the population you don't need to infer its characteristics from a sample ... you simply look at it. The null is either true or false and you can say which is true (for certain).

(If you have a large fraction of the population of interest, for some tests you have to worry about finite sample effects.)

To compare two proportions, some commonly used tests include the two-proportions Z-test, the chi-square test and Fishers exact test.

Fisher's exact test conditions on both margins in the 2x2 table (and for that matter, the chi-square is also useable in that situation). So by conditioning on both margins, the finite sample issue should be a non-issue -- it's taken care of by the conditioning.

Related Question