Time Series Analysis – Handling Many Short Time Series with Exogenous Variables Simultaneously

arimaautoregressivetime series

I am trying to find a solution to a problem, having a dataset with multiple short time series and exogenous variables.

Read this, this, and this. And many other resources. Still cannot find a clear answer if I can build any autoregressive model.

Dataset characteristics:

  • a large number of independent entities and 3 observations, annual
  • one times series for each that I want to forecast forward 1-3 years (2020-2022)
  • multiple covariates, some time series themselves some static
  • I hypothesize that any predictive power of the covariates is similar across entities

Any ideas how to approach this? I did all possible research, but nothing working so far.

Best Answer

With only three observations per id, fitting an autoregressive model is going to be problematic. Even if you have only one lag, you are essentially losing 1/3 of your data.

This is really a longitudinal data problem. So I'd start there -- look at the literature on mixed effects models for example. You will need to account for the lack of independence of the observations -- multiple observations per person. Here is a simple model to start with that uses Articles and Year as covariates in modelling Score, with Articles having a random coefficient, and Year providing a fixed effect time trend. I'm not sure that this model makes any sense, because you haven't provided enough information about your data. But it at least shows some of the relevant modelling functions in R.

library(lme4)
#> Loading required package: Matrix
download.file("https://drive.google.com/uc?authuser=0&id=13ZeOnW2tjFcOiasSRlIZGUIuUconiprP&export=download",
              temp <- tempfile())
df <- readr::read_csv(temp) 
df$id <- as.character(df$id) 
fit <- lmer(Score ~ Year + (0 + Articles|id), data=df)
summary(fit)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Score ~ Year + (0 + Articles | id)
#>    Data: df
#> 
#> REML criterion at convergence: 117.6
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.3558 -0.6324 -0.1746  0.4885  1.6620 
#> 
#> Random effects:
#>  Groups   Name     Variance Std.Dev.
#>  id       Articles  29.19    5.403  
#>  Residual          221.28   14.875  
#> Number of obs: 15, groups:  id, 5
#> 
#> Fixed effects:
#>              Estimate Std. Error t value
#> (Intercept)  -706.763  10437.660  -0.068
#> Year            0.355      5.171   0.069
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> Year -1.000
predict(fit, newdata=data.frame(Articles=5, id="1", Year=2020))
#>        1 
#> 9.426341

Created on 2022-03-04 by the reprex package (v2.0.1)

Related Question