Logistic Regression – Applying Logistic Regression for Bounds Different from 0 and 1

logistic

I have some data; it's a proportion $y$ of some stuff relative to everything, so it's bounded between 0 and 1 by definition. The proportion changes over time. Besides fairly high variance there is a step-like change about the middle of the time period; the step isn't very large, but it's there, and it happens pretty fast relative to the whole time period. So I have an S-curve, and I want to (and was told to) do logistic regression.

But: (1) this is not monotone, not even close, because of the high variance. (2) it does not go from 0 to 1. Rather, it goes from about 0.2 to about 0.8, if you look at the means. So it would seem that the right thing to do will be to fit something like $a\phi(x)+b$, where $\phi$ is the usual logistic S-curve, so we have 4 parameters altogether.

What bothers me is that I've never seen an example of logistic regression used like that (I'll admit to not seeing too many). It's not that I'm not sure how to implement this — I'm pretty confident I can figure that out, although specific pointers will be appreciated — but I am afraid that the assumption that the data is bounded by the limits of the S-curve is important for all the estimates afterward, like goodness of fit, significance of the step, confidence intervals, etc.

So:

  1. Are my fears justified?
  2. Can you point me to a relevant example in the literature? Not necessarily statistical literature, if some biologists (or whoever) use it, that's fine. Or even just someone mentioning this possibility.
  3. If there are indeed problems with this approach, what are the alternatives?

Update: Well, I was so ignorant that I could not even ask the questions properly. I was looking for multinomial (aka polytomous) regression.

Best Answer

To begin with, I think we have to distinguish between logistic regression and (generalized) logistic function. Though the latter may be viewed as a separate case of a the former taking the time as the only explanatory variable. It is then straightforward to see that fitted process will go by $S$ shaped path that goes to its upper (or, probably, lower) limit when $t \rightarrow \infty$. Therefore moving along the $S$ curve is the influence of over covariates (not time) that are changing up and down with time (in consumption structures these are income, tastes, prices, etc.). So there could be jumps or whatever, because nobody restricts the linear regression to go only up or only down to the $S$ curve's bounds.

Since you are working with structures $0$ and $1$ are natural limits. You can never be sure that any other bounds won't be hit higher or lower in the future, when your conclusions are based only on historical data analysis, and arguing that the process never did so is not an appropriate reasoning. Therefore your fears are not justified, logistic regression (but not fitting the logistic curve! that comes as a solution of deterministic differential equation!) will work just fine here. Pay attention to the fact that there could be several categories that some up to 1, so you need multinomial logit model to fit the structure in this case.

Among the alternatives there could be any model that could be applied to discrete choice. Commonly used candidates are probit and logit models. Even if you think that there is no decision in my model actually all structures in the world are the result of decision processes solved either by humans, nature or the aliens ^_^.

Related Question