Solved – Is a count variable with a large, but finite, number of possible values categorical or continuous

categorical datacontinuous datalogisticregression

I have data containing some continuous and some categorical variables. I want to do logistic regression on them.
I am getting confused over the distinction between categorical and continuous variables.
I know that a categorical variable "is a variable that can take on one of a limited, and usually fixed, number of possible values. Still, I am having trouble distinguishing some of these variables".

If I understand correctly the answers given here, non numerical data cannot be continuous, but are some numerical variables categorical?

For instance, one of the variables is "number of days during which have done something". This variable has many possible outputs (from number of days = 1 to 10,000). The number of possible values is limited, yet very big.
Is it a categorical or a continuous variable?

Best Answer

There is, as far as I know, no taxonomy of variables that captures all the contrasts that might be important for some theoretical or practical purpose, even for statistics alone. If such a taxonomy existed, it would probably be too complicated to be widely acceptable.

It is best to focus on examples rather than give numerous definitions. Number of days is a counted variable. It qualifies as discrete rather than continuous, and it is possible that the discreteness is important, particularly if most values are small. Some statistical people might want to insist that only models that apply to discrete variables should be used for such a variable.

At the same time, it is often the case that models and methods treat such a variable as approximately continuous. Population size is a yet more obvious example. Human populations can be in billions and many procedures effectively treat such variables as continuous, regardless of the familiar fact that people are individuals.

In contrast, a variable such as temperature is in principle continuous, but as a matter of convention temperatures may only be reported to the nearest degree or tenth of a degree, so the number of possible values may be rather small in practice. This does not usually worry anyone; it would certainly be perverse to call such a variable categorical. There are some contexts in which the discreteness of reported temperature is important: in reading mercury thermometers by eye and guessing at the last digit, people show idiosyncratic preferences for or against certain digits of the ten possibilities 0 to 9.

Also, what do we do with categories? Answer: we count them. We count males, females; unemployed, employed, retired, students; whatever. So, often we are modelling category counts.

In short, discrete counts are a common kind of variable, as well as continuous and categorical variables.