Longitudinal Item Response – How to Apply Longitudinal Item Response Theory Models in R

item-response-theorymixed modelpanel datarrepeated measures

I'm trying to fit longitudinal item response theory (IRT) models in R. I have a test that was administered at multiple measurement occasions. I'd like to examine individuals' growth curves of factor scores (i.e., ability levels) from graded response models (GRMs). I have used the ltm package in R to fit cross-sectional GRM models in IRT, but it's not clear to me how (or whether it's even possible in ltm) to extend the models to handle repeated measures of the same items across time. How can I fit growth curves to longitudinal GRM factor scores to see changes in means/variances of ability levels over time? If this isn't possible with the ltm package, what packages/functions permit this? Specific code examples would be especially appreciated.

Here are empirical examples similar to what I'm looking to do (except the examples use a Rasch model for binary items where I'm using GRMs for polytomous data):

McArdle, J. J., Grimm, K. J., Hamagami, F., Bowles, R. P., & Meredith, W. (2009). Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement. Psychological Methods, 14, 126-149. doi: 10.1037/a0015857

McArdle, J. J., & Grimm, K. J. (2011). An empirical example of change analysis by linking longitudinal item response data from multiple tests. In A. A. Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 71-88): Springer: New York.

For example, I'd like to be able to estimate and plot individuals' growth curves to examine mean-level change in ability levels over time (from McArdle & Grimm, 2011):
Individuals' Growth Curves In Ability Levels Over Time

And, I'd like to be able to estimate an average or prototypical growth curve for the sample (from McArdle & Grimm, 2011):
Average/Prototypical Growth Curve In Ability Levels Over Time

Here's a simulated data set with 20 polytomous items (1-3 response scale) at 3 different time points:

library(mirt)
library(mvtnorm)

set.seed(1)

numberItems <- 20
numberItemLevels <- 2
sampleSize <- 1000

a <- matrix(rlnorm(numberItems, .2, .2))
d <- matrix(rnorm(numberItems*numberItemLevels), numberItems)
d <- t(apply(d, 1, sort, decreasing=TRUE))

Theta <- mvtnorm::rmvnorm(n=sampleSize, 0, matrix(1))

t1 <- simdata(a, d, N=sampleSize, itemtype="graded", Theta=Theta)
t2 <- simdata(a, d, N=sampleSize, itemtype="graded", Theta=Theta+.5)
t3 <- simdata(a, d, N=sampleSize, itemtype="graded", Theta=Theta+1)

dat <- data.frame(t1, t2, t3)

Best Answer

As a precursor, the IRT approach to this problem is very demanding computationally due to the higher dimensionality. It may be worthwhile to look into structural equation modeling (SEM) alternatives using the WLSMV estimator for ordinal data since I imagine less issues will exist. Plus, including external covariates is much easier within that framework. Both approaches I describe here are also possible in SEM.

There are two ways that I know of which you can estimate unidimensional longitudinal IRT models that are not Rasch in nature. The the first approach requires a unique latent factor for each time block and a specific residual variation term for each item. A different approach, similar to what one would find in the SEM literature, is via a latent growth curve model whereby only a fixed number of factors are estimated (three if the relationship over time is believed to be linear). Fixed loadings are used in this approach, so computationally it may be much more stable due to the reduced number of estimated parameters, so I would tend to prefer the growth curve model for both the smaller dimensionality and fewer estimated parameters.

The idea for both approaches is to set up latent time factors indicating how person level $\theta$ values change over each test administration, and constrain the influence of their loadings across time as well so that their hyper parameters can be estimated (i.e., the latent mean and covariances). Item constraints must also be imposed across time to remain invariable so that the person differences are only captured in the hyper parameters. Since this approach can require a huge number of integration dimensions, so you'll need to use something like the dimensional reduction algorithm which is available in mirt under the bfactor() function.

Instead of going through a worked example here, which would take a lot of code, I'll simply point to a worked versions of these analyses. A word of warning though, these are very computationally demanding and may take more than an hour to converge on your computer since you have 4 dimensions of integration in the first case and 3 dimensions in the second. Or, if you don't have much RAM you could experience issues when increase the number of quadpts.

Data simulation script: https://github.com/philchalmers/mirt/blob/gh-pages/data-scripts/Longitudinal-IRT.R

Analysis output: http://philchalmers.github.io/mirt/html/Longitudinal-IRT.html

In the first example, if you save the factor scores by using fscores() you'll obtain estimates for each time point regarding how individual $\theta$ values are changing. In the second example, using the linear growth curve approach, the first column of the factor scores will represent the initial $\theta$ estimates while the second column will indicate the slope/change occurring on average over time. In the example, I set up a constant mean change of .5, so the values in fscores() should all be around 0.5 for each individual. Both analyses give roughly the same conclusions but are somewhat different approaches to the problem. However, if you are familiar with longitudinal models in SEM then these should be fairly natural to interpret.

Related Solutions

Solved – Internal consistency reliability in item response theory models

You can compute test information curves from your IRT parameter estimates. These curves give you the precision of the test at each $\theta$ of the latent trait. The information $I$ can be transformed into the standard error of estimate $SEE$, which is a direct estimate of the reliability of the test at that $\theta$: $SEE = 1 / \sqrt{I}$.

The metric of the test information can also be converted to a traditional reliability metric expressed by a correlation coefficient (Thissen, 2000): $Rel = 1 - (1/I)$. Here are the conversions from a set of TICs to correlational reliability estimates:

# following Thissen, 2000:
TIC <- seq(1, 12, by=1)
round((rel <- data.frame(TIC, SEE=sqrt(1/TIC), REL=1-1/TIC)), 2)

TIC  SEE  REL
  1 1.00 0.00
  2 0.71 0.50
  3 0.58 0.67
  4 0.50 0.75
  5 0.45 0.80
  6 0.41 0.83
  7 0.38 0.86
  8 0.35 0.88
  9 0.33 0.89
 10 0.32 0.90
 11 0.30 0.91
 12 0.29 0.92

For example, a TIC > 5 corresponds to a reliability > .80.

Thissen, D. (2000). Reliability and measurement precision. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 159–184). Lawrence Erlbaum Associates Publishers.

Solved – Implementing nominal model from Polytomous Item Response Theory

The $a$ and $b$ parameters likely don't function the way you would think. Specifically, the $a$s are not really discrimination parameters in the sense of the usual IRT dichotomous/graded response models, but instead represent the parameter ordering. Under Bock's (1972) original formulation, both sets of parameters were constrained to sum to 1 for proper identification and interpretation, so some of the $a$s will necessarily have to be negative as well as the $b$s. Lower values of $a$ represent lower categories (where the rank of the values indicate the empirical ordering), while lower values of $b$ represent the relative 'easiness' of the category to select.

What you've presented here is two response category that have exactly the same ordering (neither one is theoretically higher than the other) since both the $a$ and $b$ are the same. Try changing the value of the parameters to see what happens, but try to do so under the $\sum_k a_k = 1$ and $\sum_k b_k = 1$ constraints (not required for plots of course, but good to think about to see how real data would behave).

For a quick interactive graphical representation already available, you could always check out the shiny interface that ships with the mirt package. Simply put the following in R:

library(mirt)
itemplot(shiny=TRUE)

Then select the nominal itemtype from the drop down menu, and click the checkbox to use the traditional IRT metric. Play around after that with the parameter values sliders to see what happens to the output plots. Hope that helps.

References

Bock, R. D. Estimating item parameters and latent ability when the responses are scored in two or more nominal categories Psychometrika, 1972, 37, 29-51.

Best Answer

Related Solutions

Solved – Internal consistency reliability in item response theory models

Solved – Implementing nominal model from Polytomous Item Response Theory

Related Question