I have a dataset composed by 5000 observations. Each observation contains the income per year of a person (from 50 to 50.000.000) and the fact of having a car (yes/no).
I would like to check if a correlation exists between these two features.
Which test I should run?
thanks in advance
Best Answer
As always, it depends on the data generating mechanism you have in mind. I should note, however, that it would be customary to use a logit model. In this, we assume that the odds of having a car is linear in the explanatory variables, i.e.
$\log(\frac{\Pr(Car)}{1-\Pr(Car)})=\beta_0+\beta_1income$
One can then test whether or not $\beta_1=0$, which is a test of the hypothesis that income influences car ownership.