I know the question is easy but I didn't manage to find an answer (in case, a link to a similar question would help, thanks).
I have a simple logistic regression model with 2+ categorical predictors.
To keep it simple, let's make an example:
- predictor 1 = age group = young/normal/old
- predictor 2 = city = rome/paris/london
- target variable = the user converted (1) or didn't convert (0)
I have to use dummy variables (with the n-1 rule) so my model is:
target = b0 + b1*age_young + b2*age_old + b3*city_paris + b4*city_london
My reference category for the age group is normal and for the city is rome.
Let's say I get the following results:
- b0 (intercept) = -2.9429
- b1 (age_young) = -0.0624
- b2 (age_old) = -0.1618
- b3 (city_paris) = 0.4060
- b4 (city_london) = 1.0060
So e^b0 should be the odds ratio when all the variables are 0, i.e. when the user is from rome and belongs to the age group normal.
Here are the questions:
-
is e^b3 the odds-ratio when the city is paris (easy interpretation)? Or is it the odds-ratio of paris compared to rome (something like a marginal odd ratio) (I'm not sure about the interpretation in plain english in this case)?
-
do the users from rome convert more or less compared to users from paris or london? For me it's hard to say, given that b0 "contains information" about user from rome and from the normal age group. It seems that from b0 I cannot extract information about rome only.
— EDIT 1 —
b3 (city_london) –> b4 (city_london)
— EDIT 2 —
b0 should be the odd ratio –> e^b0 should be the odds-ratio
Best Answer
The coefficients of a logistic regression cannot be directly interpreted as odds-ratio. One possible way to interpret them is to get back to the definition of a logistic. If the estimated coefficients are $\beta$, the predicted probability for a user with characteristics $X_i$ is $$ \hat p(X_i) = F(X_i) = \frac{1}{1+e^{-X_i \hat \beta}} $$ Now, to get to your questions.