Solved – categorical predictors in a GLM

logisticrregressionself-study

I need some help answering a homework question,

I have entered the data into R using assignments

data <- data.frame(category=c(1:8),
                   obese =c(597,380,665,524,1014,365,942,552),
                   number = c(2346,1659,2576,1732,1499,639,1491,769), 
                   male=c(1,0,1,0,1,0,1,0),
                   white =c(1,1,0,0,1,1,0,0),
                   younger =c(1,1,1,1,0,0,0,0))

I then used

output = glm(obese ~ male + white + younger, family = binomial)

to model the data. But this doesn't seem to be working. I've not really used much R before for GLMs or logistic regression and I don't understand how to get GLM from categorial, binary predictor variables.
If someone could explain the theory behind because searching online I have only found answers where the response variable is binary and this seems straight forward.

Best Answer

Your data already seems to be summarized: in the 1st category (younger white males), 597 out of 2346 subjects are obese. At least, that's my understanding - it's always good to really understand the data you are modeling.

If my interpretation is right, then an easy way to do this seems to be to turn your first row into 597 rows with obese=1 and 2346-597 rows with obese=0, then your GLM should work fine.

Related Solutions

Solved – Interact categorical variables in GLM in R

From your comments, it appears that you have not specified to R that these two variables are categorical. (factor variables in R). Given they have the appearance of numeric (continuous) variables, R will assume they are, and fit the model as if they were continuous.

To convert to factor variables (with your data.frame d)

d$wealth_q <- factor(d$wealth_q)
d$maternal_q <- factor(d$maternal_q)

Also, your formula is somewhat redundant

~maternal_eq * wealth_q expands to main effects + interactions

So the following should work

form <- nutrition ~ maternal_eq * wealth_q + other_covars
model.results <- glm(form, data=d, family=quasibinomial)

Solved – Which glm algorithm to use when predictors are numerical as well as categorical

When your dependent variable is binary ($1$ vs. $0$, "dead" vs. "alive"), the you might use logistic regression which is a glm with a binomial error distribution and a logit link function. When your dependent variable is ordinal (e.g. "bad"> "good" > "best"), you can use ordinal logistic regression. For a nominal (e.g. transportation: "walk", "car", "bicycle") dependent variable, you can use multinomial logistic regression.

EDIT

Your approach to convert the disease status into a 0,1-variable seems correct. If your outcome is continuous, you could use a GLM with a gaussian error distribution and an identity link function which is equivalent to a simple multiple regression model (OLS).

Best Answer

Related Solutions

Solved – Interact categorical variables in GLM in R

Solved – Which glm algorithm to use when predictors are numerical as well as categorical

Related Question