Solved – Correlation between quantitative and qualitative variables

categorical datacorrelationqualitative

I have a dataset composed by 5000 observations. Each observation contains the income per year of a person (from 50 to 50.000.000) and the fact of having a car (yes/no).

I would like to check if a correlation exists between these two features.
Which test I should run?

thanks in advance

Best Answer

As always, it depends on the data generating mechanism you have in mind. I should note, however, that it would be customary to use a logit model. In this, we assume that the odds of having a car is linear in the explanatory variables, i.e.

$\log(\frac{\Pr(Car)}{1-\Pr(Car)})=\beta_0+\beta_1income$

One can then test whether or not $\beta_1=0$, which is a test of the hypothesis that income influences car ownership.