Solved – run a regression when both independent and dependent variables are all dichotomous

logisticregression

I have conducted a survey where all my questions are asked in a dichotomous manner (Yes/No).

Eg IV:"Are you a smoker?", "Are you obese", "Is your gender male/Female" etc. DV: "Have you ever had a stroke?"

Therefore both my dependent variable and independent variables are all dichotomous(Binary= measured in 0s and 1s).

My question is, is it appropriate to run a regression to determine the independent variables that drives the dependent variable given the fact that every single one of my variables (both dependent and independent) are dichotomous in nature?

If so, what kind of regression is the most appropriate? (Logistic regression?) and is there anything I should do to make the regression model more accurate?

I have rudimentary understanding of statistics and regression modelling and would be so grateful if someone would point me in the right direction.

Best Answer

In this case, you are relating binary properties of a person (answers to questions) to binary outcome (stroke/no stroke). A good place to start is to formulate this as a logistic regression problem, since it will constrain your dependent variable to be between 0 and 1. The result can be interpreted as the probability that the person will have a stroke given their answers to the survey. (Assumes we code "Yes=1, No=0").

Of course, you will need to (a) ensure your sample was representative of the group you intend to use it on (or of the general population being studied) and (b) cross-validate your data to see how robust your findings are.

Related Question