Solved – Should I treat age as a continuous variable or factor

categorical datacontinuous datageneralized linear model

Age is a predictor variable among a few others (gender, weight and height) and my response is the log-odds of a certain disease (binomial glm).

My age data runs from 21 until 40. I am not sure whether to treat age as a continuous variable or as a factor with age groups: 21-25, 26-30, 31-35, 36-40.

Are there any plots which I can produce that can help determine which would be the better approach?

Best Answer

It depends on the context. For example if you are looking for the effect of age on children's height, it makes sense to look at it as a continuous ( integer) value. If you're looking for e.g. the effect of age on oncogenesis then it makes sense if you look at age groups. Young vs old, above 55 and below 55, ...

For your example, unless age is a confounder of a hidden factor such as for example being college grad or still a student ( risk factor for young adults STD infection), I'd bin my data into reasonable bin sizes.

Related Question