Solved – Logistic Regression on Time Series Data

linearlogisticregressiontime seriestrend

I would like to forecast the probability of a binary outcome using logistic regression at t+1, using all previous data points. I am new to forecasting so any help would be appreciated.

The raw data is in the form:

Time | Correct
1      0
1      1
2      1
3      0
3      0
3      1
4      1
5      0

I have averaged the data over each day to produce:

Time | AVG. Correct
1      0.5
2      1
3      0.3
4      1
5      0

I know that there is a linear trend in the data, such that as time progresses the average correct value increases.

Using this information would it be possible to use logistic regression to forecast the next time step (t = 6). How would you account for the linear trend in the data for a logistic regression model?.

Best Answer

You could fit a simple logistic regression model and include time as a covariate, this would imply a linear time trend.

Note that in the regression, the time trend is negative and insignificant – you simply have too few observations to make any statements regarding the coefficient of a linear time trend.

See this R-code:

# data input
dat <- data.frame(time=c(1,1,2,3,3,3,4,5),
                  correct=c(0,1,1,0,0,1,1,0))

res <- glm(correct ~ time + 1, data=dat, family=binomial)

summary(res)
# the time trend is negative and insignificant!

# predict 'correct' probability at time t=6
predict(res, newdata=data.frame(time=6), type="response")

# 0.2710599

Related Solutions

Solved – How to dampen forecast to improve accuracy

Dampening can be thought of as a special case of shrinkage methods; these methods as a whole tend to reduce uncertainty in estimates (yet another circumstance of trading bias for variance, an ever-recurring theme in statistics, though in some cases, such as many involving variable-selection, shrinkage can reduce both bias and variance).

There are many methods by which forecasts are produced, but they have a wide variety of characteristics.

Consider, for example, a case where we just observe independent random values with a constant mean. Then dampening would involve moving our forecasts a little toward zero from the sample mean.

In methods that estimate a local-mean effect element (some form of adaptive level, say), dampening could move the forecast of that recent level toward 0 (or, for slightly different models, toward the overall mean), so that forecast trends tended to 'fall back' from the most recent excursion over longer forecast horizons.

In methods that have an element of recent local linear trend (some form of adaptive trend, say), dampening could move the forecast of that recent trend toward 0, so that forecast trends tended to 'flatten out' over longer forecast horizons.

Now consider a case where, for example, there's seasonal effects around some other trend - dampening of the seasonal component would tend to shrink the forecast strength of the seasonality, "smoothing out" the wiggle over longer forecast horizons.

The text by Hyndman and Athanasopoulos, Forecasting principles and practice, (freely available on-line as well as in dead-tree form) has a section on dampening, but you need some of the preceding sections for context - the models in sec 7.4 are damped versions of the models in sec 7.2, Holt linear trend models). I highly recommend investigating this book. [You may find the older book on forecasting by Makridakis, Wheelwright and Hyndman in libraries or on forecaster's bookshelves. It's also very handy.]

That section of text on dampening includes something like the kind of thing you're asking for - a tunable smoothing parameter, $\phi$, between $0$ and $1$, to produce a dampening effect ... but as you'll see if you look at that section, it's of a different form for different models - the additive trend and multiplicative trend models use different formulas!

So how might we "dampen" a linear least squares fit*?

*(Note that I don't particularly regard this as a suitable approach for very many forecasting problems, but the Hyndman and Athanasopoulos text is a better place to investigate the merits of various approaches. Nevertheless, let us proceed, since a discussion of the issues covers many of the issues one must consider when trying to dampen models more generally)

A linear regression doesn't have any 'local' component to it at all, it's a global model. In any case, as I mentioned at the start, dampening can be thought of as shrinkage, and we can still shrink that estimate of the linear component toward 0.

But we then have the question of about which center do we 'tilt' the line? The obvious approach for an actually global model would be to do it about the mean:

The line in center-slope form would be $y-\bar{y} = \beta (x-\bar{x})+\varepsilon_t$.

The point forecasts are $\hat{y_t} = \bar{y} + b (x_t-\bar{x})$, where $b=\hat{\beta}$, the slope estimate.

If $T$ is the last observed time, then the point forecasts are

$\hat{y_{T+k}} = \bar{y} + b (x_{T+k}-\bar{x})$.

One simple and commonly used dampening is to multiply it at each step by a constant $\phi$, where $0<\phi<1$ which would let the slope shrink with time:

$\hat{y_{T+k}} = \bar{y} + b_{T+k} (x_{T+k}-\bar{x})$,

where $b_{T+k} = \phi b_{T+k-1} = \phi^k b_{T} = \phi^k b$.

enter image description here

However, as I said, a global model doesn't necessarily make as much sense as some other options if it's a situation in which we'd want to apply such dampening; we might consider instead pivoting the slope around a point somewhere near the end (nearer to $t=T$), to make it more local... but if we think the model should be local in that sense, we should probably be looking at locally-linear models to start with, and then dampen those trends.

There's some distinct similarities in general approach I took here to the one used for the additive trend model, though that's a local model, not a global one - and thereby more suited to forecasting trends that tend to be linear for a while, but where the linear trend isn't constant in the long term.

That approach of progressive multiplication of a model-component by $\phi$ could be used for any number of models if applied to the parts it makes sense to shrink to be smaller - it could be applied to seasonal components, or AR parameters, or any number of other things. Different components of forecasting models may even be shrunk at different rates.

[That progressive multiplication by a constant (geometric shrinkage) isn't the only way to shrink components of a model, but it's the most common.]

Solved – Random Forest Regression and trended time-series

RFs, of course, can identify and model a long-term trend in the data. However, the issue becomes more complicated when you are trying to forecast out to never seen before values, as you often are trying to do with time-series data. For example, if see that activity increases linearly over a period between 1915 and 2015, you would expect it to continue to do so in the future. RF, however, would not make that forecast. It would forecast all future variables to have the same activity as 2015.

from sklearn import ensemble
import numpy as np
years = np.arange(1916, 2016)
#the final year in the training data set is 2015
years = [[x] for x in years]
print 'Final year is %s ' %years[-1][0]
#say your ts goes up by 1 each year - a perfect linear trend
ts = np.arange(1,101)
est = ensemble.RandomForestClassifier().fit(years,ts)
print est.predict([[2013], [2014], [2015], [2016] , [2017], [2018]])

The above script will print 2013, 2014, 2015, 2015, 2015, 2015. Adding lag variables into the RF does not help in this regard. So careful. I'm not sure if adding trend data to your RF is gonna do what you think it will.

Best Answer

Related Solutions

Solved – How to dampen forecast to improve accuracy

Solved – Random Forest Regression and trended time-series

Related Question