Solved – How to write down a logistic regression formula with multiple levels of a categorical variable

logisticregression

I dont know how to correctly present a logistic regression model in expressions or formula in a manuscript or a report, especially with a multiple-level categorical variable. For instance, I have a 3-level treatments (treatment) as the explanatory variable: control, low, and high. The outcome (Y) is alive or dead. Can someone suggest how to write down the formula? Is there a standard formula or an easy understandable formula, especially for non-statistical readers (biology or medicine)? I am bit afraid that if I present the model using matrix format, the readers would not get the idea that there are two values of coefficient beta. But for me, it would be nice to see more forms of presentations.

This is what I can think of:

Y_i ~ Binomial(1,p_i)

logit(p_i) = intercept + beta_k*treatment_i

where i indicates the ith sample. For beta_k, k=low when the ith sample has treatment low; and k=high when the ith sample has treatment high.

Thanks very much

Best Answer

If your audience is non-statistical then your first line declaring the distribution of $Y_i$ to be binomial will just tend to confuse more than help. So I would just leave that out.

With only three treatments and a non-technical audience I don't see the added value of trying anything fancy. Instead I would just mention those two indicator (dummy) variables directly.

Since your audience is from the bio-medical fields, they tend to be familiar with Odds, so you could formulate it in those terms:

$\ln(odds(Y_i=dead|x_i)) = \beta_0 + \beta_1 low_i + \beta_2 high_i $

You could do this in terms of the probability:

$\ln\left(\frac{p(Y_i=dead|x_i)}{1-p(Y_i=dead|x_i)}\right) = \beta_0 + \beta_1 low_i + \beta_2 high_i $

or

$p(Y_i=dead|x_i) = \frac{\exp(\beta_0 + \beta_1 low_i + \beta_2 high_i)}{1+\exp(\beta_0 + \beta_1 low_i + \beta_2 high_i)}$

Related Question