Solved – IQ adaptive test items in 1pl, 2pl or 3pl IRT model

item-response-theorypsychometrics

Some adapative test systems (e.g. school assessment tools) use the 1pl IRT model, while others use the 2pl or the 3pl. When developing an adaptive IQ test, is there a rule of thumb about which model to choose in calibrating the item difficulty and test takers ability?

I can't find any research that gives some insights in fit between IQ test items and different kinds of IRT models.

Many thanks in advance!

Best Answer

I think the difference primarily is a philosophical one when choosing Rasch/1PL models (the emphases on what measurement means is slightly different in that literature, and hence researchers try their best to obtain these special items), and an empirical/design one when deciding between using 2PL and 3PL models.

Since the slopes are all equal in 1PL models determining a persons location amounts to finding the optimal location where respondents have a P = 0.5 chance of answering correctly by simply choosing items with the best intercepts to get an estimate of $\theta$, whereas in 2- and 3PL models it's slightly more complicated due to the unequal slopes and lower bound parameters for guessing. As a consequence, 2-3PL models often require more advanced adaptive item selection procedures such as the Kullback–Leibler/Fisher information to select the next best item for honing in on $\theta$.

Speaking purely from a design perspective if the adaptive testing items contain a finite number of responses then the 3PL seems like the better option, but if it's more of a fill in the blank style answer (e.g., 2 + 3 = __.) then the 1PL and 2PL models would, at least theoretically, be more reasonable.

Related Solutions

Solved – IRT in R: Does anyone know of an IRT item calibration function that can cope with NA’s

As I stated in the comments above, missing data can be handled by either the ltm or mirt package when the data is MCAR. Here is an example of how to use both on a dataset with missing values:

> library(ltm)
> library(mirt
> set.seed(1234)
> dat <- expand.table(LSAT7)
> dat[sample(1:(nrow(dat)*ncol(dat)), 150)] <- NA
> head(dat)
     Item.1 Item.2 Item.3 Item.4 Item.5
[1,]      0      0      0      0      0
[2,]      0      0      0      0      0
[3,]      0      0      0      0      0
[4,]      0      0      0      0      0
[5,]      0      0      0      0      0
[6,]      0      0      0      0     NA
> (ltmmod <- ltm(dat ~ z1))

Call:
ltm(formula = dat ~ z1)

Coefficients:
        Dffclt  Dscrmn
Item.1  -1.891   0.967
Item.2  -0.720   1.147
Item.3  -1.008   1.885
Item.4  -0.671   0.760
Item.5  -2.554   0.729

Log.Lik: -2572.402

> (mirtmod <- mirt(dat, 1))
Iteration: 22, Log-Lik: -2572.402, Max-Change: 0.00010
Call:
mirt(data = dat, model = 1)

Full-information item factor analysis with 1 factors 
Converged in 22 iterations with 41 quadrature. 
Log-likelihood = -2572.402
AIC = 5164.805; AICc = 5165.027
BIC = 5213.882; SABIC = 5182.122
> coef(mirtmod)
$Item.1
       a1     d g u
par 0.967 1.829 0 1

$Item.2
       a1     d g u
par 1.148 0.826 0 1

$Item.3
       a1     d g u
par 1.886 1.902 0 1

$Item.4
      a1    d g u
par 0.76 0.51 0 1

$Item.5
       a1     d g u
par 0.729 1.863 0 1

$GroupPars
    MEAN_1 COV_11
par      0      1

It's also possible to impute missing values given a good estimate of $\theta$ for obtaining things like model and item fit statistics (should do this several times if the amount of missingness is non-trivial, and it's even better to jitter the $\hat{\theta}$ values as a function of the respective $SE_{\hat{\theta}}$ values for more reasonable imputation results).

> Theta <- fscores(mirtmod, full.scores = TRUE, scores.only = TRUE)
> fulldat <- imputeMissing(mirtmod, Theta)
> head(fulldat)
  Item.1 Item.2 Item.3 Item.4 Item.5
1      0      0      0      0      0
2      0      0      0      0      0
3      0      0      0      0      0
4      0      0      0      0      0
5      0      0      0      0      0
6      0      0      0      0      0

Best Answer

Related Solutions

Solved – IRT in R: Does anyone know of an IRT item calibration function that can cope with NA’s

Related Question