As a precursor, the IRT approach to this problem is very demanding computationally due to the higher dimensionality. It may be worthwhile to look into structural equation modeling (SEM) alternatives using the WLSMV estimator for ordinal data since I imagine less issues will exist. Plus, including external covariates is much easier within that framework. Both approaches I describe here are also possible in SEM.
There are two ways that I know of which you can estimate unidimensional longitudinal IRT models that are not Rasch in nature. The the first approach requires a unique latent factor for each time block and a specific residual variation term for each item. A different approach, similar to what one would find in the SEM literature, is via a latent growth curve model whereby only a fixed number of factors are estimated (three if the relationship over time is believed to be linear). Fixed loadings are used in this approach, so computationally it may be much more stable due to the reduced number of estimated parameters, so I would tend to prefer the growth curve model for both the smaller dimensionality and fewer estimated parameters.
The idea for both approaches is to set up latent time
factors indicating how person level $\theta$ values change over each test administration, and constrain the influence of their loadings across time as well so that their hyper parameters can be estimated (i.e., the latent mean and covariances). Item constraints must also be imposed across time to remain invariable so that the person differences are only captured in the hyper parameters. Since this approach can require a huge number of integration dimensions, so you'll need to use something like the dimensional reduction algorithm which is available in mirt
under the bfactor()
function.
Instead of going through a worked example here, which would take a lot of code, I'll simply point to a worked versions of these analyses. A word of warning though, these are very computationally demanding and may take more than an hour to converge on your computer since you have 4 dimensions of integration in the first case and 3 dimensions in the second. Or, if you don't have much RAM you could experience issues when increase the number of quadpts
.
Data simulation script: https://github.com/philchalmers/mirt/blob/gh-pages/data-scripts/Longitudinal-IRT.R
Analysis output: http://philchalmers.github.io/mirt/html/Longitudinal-IRT.html
In the first example, if you save the factor scores by using fscores()
you'll obtain estimates for each time point regarding how individual $\theta$ values are changing. In the second example, using the linear growth curve approach, the first column of the factor scores will represent the initial $\theta$ estimates while the second column will indicate the slope/change occurring on average over time. In the example, I set up a constant mean change of .5, so the values in fscores()
should all be around 0.5 for each individual. Both analyses give roughly the same conclusions but are somewhat different approaches to the problem. However, if you are familiar with longitudinal models in SEM then these should be fairly natural to interpret.
I don't believe I can offer what you're looking for, but the first step is to use the repeated individual_id as a variable to ensure that each individual is in 1 partition. For example if you're using cross-fold validation, then an individual should only show up in 1 fold and not be spread out amongst the others.
As far as what machine learning algorithms to try - that is ultimately up to the data. In my experience though, I think your best results will come from some sort of boosted tree such as LightGBM or xGBoost. This will lead to you deciding how to encode the categorical variables, for which I recommend category_encoders library in python, if you're using python.
I'm sure there's interesting and novel ideas around RNN's but to be honest I don't think this problem is suited for that type of algorithm. This sounds like a classic regression problem to me.
Best Answer
There are a few approaches you could take.
First, you could project-out the fixed effects, then run ridge or lasso:
You'd then go back and calculate the fixed effects using
getfe
or something, to make the prediction.If your dataset is small like in the example, you could make model matrices of all squares and cross-products, etc.
Also, if your data is small, you could simply put your cross-sectional unit into the design matrix of the ridge regression as a factor -- (penalized) least squares dummy variables. It turns out that L2-penalized LSDV is equivalent to random effects. If you don't care about unbiased parameter estimates, you should always prefer random effects to fixed effects.
You could also simply ignore the cross-sectional unit:
Here you'd want to take out year, because RF can't extrapolate. I assume you're making predictions for the next period. You could consider detrending your data before putting it into the random forest.
Finally, there is an experimental package here that projects-out fixed effects from the top level of a neural network. It might not yet be very reliable, however.