Solved – Logistic regression and the 2 by 2, or 3 by 2 contingency table

logisticregression

I just have a question about logistic regression and the 2 by 2 or 3 by 2 or n by 2 contingency table: the table can be found here:

http://en.wikipedia.org/wiki/Contingency_table

My question is when I have a table like this:

      Right-handed   Left-handed    Total
Male            43             9       52 
Female          44             4       48 
Total           87            13      100

if I want to know if gender is associated with a person being right-handed or left-handed, I can calculate the odd ratio as an indication of whether being male is more likely to be left-handed:

(9/52) / (4/48) = 2.07

Then, I can also test using the Pearson Chi-square test to see if there is significant association? like p-value <0.05 right?

But I can also create an indicator variable with 100 people in the data set, 52 being male, 48 being females, and let 1= left-handed , 0= right-handed. Then get the odd-ratio from the output of the logistic regression.

So my question is, I am not sure which is the right way to find out which group is more likely to be left-handed?

Also is it possible to get some opinions about what is the differences between the two approaches? are they answering the same question?

1) Basically, I just want to know what is the difference between the two approaches here in answering questions? (what kind of question does each approaches designed to answer?).

2) What is the difference between the p-value given by say in this case the 2-by-2 contingency table and the p-value given by the significant test of the parameter estimate of gender(M,F) in the logistic regression?

Could someone kindly explain?

Best Answer

The odds ratio in that table is:

(9/43)/(4/44) = 2.30

What John_w computed was a risk ratio.

Now you can see that the manually computed odds ratio is exactly the same as the one produced by logistic regression. Both tests you suggested test the null-hypothesis that this odds ratio is equal to 1. The p-values should be close enough to not matter. For example doing this in Stata gives the following output:

. // prepare the data
. clear

. input female right freq

        female      right       freq
  1.         0          1         43
  2.         0          0          9
  3.         1          1         44
  4.         1          0          4
  5. end

. label define female 0 "male" 1 "female"

. label value female female

. label variable female "respondent's sex"

.
. label define right 0 "left-handed" 1 "right-handed"

. label value right right

. label variable right "respondent's handeness"

.
. // tabulate
. tab female right [fw=freq], lr chi2

           |     respondent's
respondent |       handeness
    's sex | left-hand  right-han |     Total
-----------+----------------------+----------
      male |         9         43 |        52
    female |         4         44 |        48
-----------+----------------------+----------
     Total |        13         87 |       100

          Pearson chi2(1) =   1.7774   Pr = 0.182
 likelihood-ratio chi2(1) =   1.8250   Pr = 0.177

.
. // the odds ratio:
. di (9/43)/(4/44)
2.3023256

.
. // logistic regression
. logit right female [fw=freq], or nolog

Logistic regression                               Number of obs   =        100
                                                  LR chi2(1)      =       1.82
                                                  Prob > chi2     =     0.1767
Log likelihood = -37.726174                       Pseudo R2       =     0.0236

------------------------------------------------------------------------------
       right | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   2.302326   1.468974     1.31   0.191     .6592751    8.040199
       _cons |   4.777778   1.751347     4.27   0.000      2.32921    9.800386
------------------------------------------------------------------------------

For fun I also asked for the likelihood ratio chi square statistic after tab, to show that test is exactly the same as the one reported by logistic regression (labeled LR chi2(1) and Prob > chi2 in the output of logit).

Related Question