Solved – Mixing dichotomous and polytomous items in one test

item-response-theory

I have to evaluate a test paper which contains both dichotomous and polytomous items. I am currently using R. But in R IRT models for scoring of dichotomous and polytomous items are different and I have not found any way of combining both type of items so how can I mix both dichotomous and polytomous items into one. Specifically –

Is there any programme/function in R that handles both dichotomous and polytomous items
If no direct function can combine both dichotomous and polytomous items in R then is there any indirect way of doing it
If not 1 and 2 then any other open source IRT software which can handle dichotomous and polytomous items together

Best Answer

Yes, there are several: eRm, mirt, ltm etc. In fact, most IRT models can handle dichotomous items as a special case of polytomous items. E.g. in ltm, you can use grm on polytomous and dichotomous items:

R> library(ltm)
R> data(Science)
R> summary(Science)

We see the Science data are polytomous (4-levels). Let's first fit the graded response model to the original items

R> mpol<-grm(Science[c(1,3,4,7)])
R> summary(mpol)Call:
R> grm(data = Science[c(1, 3, 4, 7)])

Model Summary:
   log.Lik      AIC      BIC
 -1608.871 3249.742 3313.282

Coefficients:
$Comfort
         value
Extrmt1 -4.672
Extrmt2 -2.536
Extrmt3  1.408
Dscrmn   1.041

$Work
         value
Extrmt1 -2.385
Extrmt2 -0.735
Extrmt3  1.849
Dscrmn   1.226

$Future
         value
Extrmt1 -2.281
Extrmt2 -0.965
Extrmt3  0.856
Dscrmn   2.299

$Benefit
         value
Extrmt1 -3.060
Extrmt2 -0.906
Extrmt3  1.543
Dscrmn   1.094


Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.0092 
quasi-Newton: BFGS

Now I will dichotomize the first item, Comfort, like so

R> Science2<-Science
R> levels(Science2$Comfort)<-c("negative","negative","positive","positive")
R> summary(Science2)

We see now that the first item has levels "strongly disagree" and "disagree" combined into "negative" and "strongly agree" and "agree" combined into "positive".

R> mmixed<-grm(Science2[c(1,3,4,7)])
R> summary(mmixed)

Call:
grm(data = Science2[c(1, 3, 4, 7)])

Model Summary:
   log.Lik      AIC      BIC
 -1411.065 2850.129 2905.727

Coefficients:
$Comfort
         value
Extrmt1 -3.261
Dscrmn   0.762

$Work
         value
Extrmt1 -2.455
Extrmt2 -0.750
Extrmt3  1.900
Dscrmn   1.176

$Future
         value
Extrmt1 -2.076
Extrmt2 -0.887
Extrmt3  0.787
Dscrmn   3.086

$Benefit
         value
Extrmt1 -3.468
Extrmt2 -1.020
Extrmt3  1.737
Dscrmn   0.927


Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.057 
quasi-Newton: BFGS

So, you see, this is possible for all packages that support polytomous items. See sections 2.1. to 2.4. in this review on IRT packages in R.

Related Solutions

Solved – IRT in R: Does anyone know of an IRT item calibration function that can cope with NA’s

As I stated in the comments above, missing data can be handled by either the ltm or mirt package when the data is MCAR. Here is an example of how to use both on a dataset with missing values:

> library(ltm)
> library(mirt
> set.seed(1234)
> dat <- expand.table(LSAT7)
> dat[sample(1:(nrow(dat)*ncol(dat)), 150)] <- NA
> head(dat)
     Item.1 Item.2 Item.3 Item.4 Item.5
[1,]      0      0      0      0      0
[2,]      0      0      0      0      0
[3,]      0      0      0      0      0
[4,]      0      0      0      0      0
[5,]      0      0      0      0      0
[6,]      0      0      0      0     NA
> (ltmmod <- ltm(dat ~ z1))

Call:
ltm(formula = dat ~ z1)

Coefficients:
        Dffclt  Dscrmn
Item.1  -1.891   0.967
Item.2  -0.720   1.147
Item.3  -1.008   1.885
Item.4  -0.671   0.760
Item.5  -2.554   0.729

Log.Lik: -2572.402

> (mirtmod <- mirt(dat, 1))
Iteration: 22, Log-Lik: -2572.402, Max-Change: 0.00010
Call:
mirt(data = dat, model = 1)

Full-information item factor analysis with 1 factors 
Converged in 22 iterations with 41 quadrature. 
Log-likelihood = -2572.402
AIC = 5164.805; AICc = 5165.027
BIC = 5213.882; SABIC = 5182.122
> coef(mirtmod)
$Item.1
       a1     d g u
par 0.967 1.829 0 1

$Item.2
       a1     d g u
par 1.148 0.826 0 1

$Item.3
       a1     d g u
par 1.886 1.902 0 1

$Item.4
      a1    d g u
par 0.76 0.51 0 1

$Item.5
       a1     d g u
par 0.729 1.863 0 1

$GroupPars
    MEAN_1 COV_11
par      0      1

It's also possible to impute missing values given a good estimate of $\theta$ for obtaining things like model and item fit statistics (should do this several times if the amount of missingness is non-trivial, and it's even better to jitter the $\hat{\theta}$ values as a function of the respective $SE_{\hat{\theta}}$ values for more reasonable imputation results).

> Theta <- fscores(mirtmod, full.scores = TRUE, scores.only = TRUE)
> fulldat <- imputeMissing(mirtmod, Theta)
> head(fulldat)
  Item.1 Item.2 Item.3 Item.4 Item.5
1      0      0      0      0      0
2      0      0      0      0      0
3      0      0      0      0      0
4      0      0      0      0      0
5      0      0      0      0      0
6      0      0      0      0      0

Solved – IQ adaptive test items in 1pl, 2pl or 3pl IRT model

I think the difference primarily is a philosophical one when choosing Rasch/1PL models (the emphases on what measurement means is slightly different in that literature, and hence researchers try their best to obtain these special items), and an empirical/design one when deciding between using 2PL and 3PL models.

Since the slopes are all equal in 1PL models determining a persons location amounts to finding the optimal location where respondents have a P = 0.5 chance of answering correctly by simply choosing items with the best intercepts to get an estimate of $\theta$, whereas in 2- and 3PL models it's slightly more complicated due to the unequal slopes and lower bound parameters for guessing. As a consequence, 2-3PL models often require more advanced adaptive item selection procedures such as the Kullback–Leibler/Fisher information to select the next best item for honing in on $\theta$.

Speaking purely from a design perspective if the adaptive testing items contain a finite number of responses then the 3PL seems like the better option, but if it's more of a fill in the blank style answer (e.g., 2 + 3 = __.) then the 1PL and 2PL models would, at least theoretically, be more reasonable.

Best Answer

Related Solutions

Solved – IRT in R: Does anyone know of an IRT item calibration function that can cope with NA’s

Solved – IQ adaptive test items in 1pl, 2pl or 3pl IRT model

Related Question