Reporting IRT Theta Scores – Is Percentile Common Practice?

item-response-theorypsychometricsreporting

I have been teaching myself IRT. I think I get it, however, I've not yet reached a point where I would say that I'm totally confident.

In assuming that the latent trait is normally distributed with mean θ = 0 and SD = 1, am I correct in believing that θ estimates can be thought of as z-scores on a standard scale and as a corollary, converted to percentiles?

The reason why I ask this is because very rarely is this even mentioned in any references. Actually, I can't recall ever coming across a single reference discussing.

The following article claims that although IRT models assume normality this is not necessarily the case:

It’s common in Item response theory (IRT) item parameter estimation to assume that the latent variable (the trait/concept/construct) being measured follows this standard normal distribution (the assumption of normality). The reason for this assumption is that it makes the calculations for getting items parameters a lot simpler, compared to if you didn’t make the assumption… While research has shown that assuming the standard normal distribution for a population that isn’t standard normally distributed isn’t necessarily the end of the world (De Ayala, 1995; Stone, 1992; Woods & Lin, 2009), if a population distribution is decidedly non-normal and decidedly non-symmetric, parameter estimates can become quite biased/wrong (e.g., Woods, 2015).

Furthermore, in Item Response Theory by DeMars (2010) she states:

The metric of the parameters is somewhat arbitrary. The metric is
indeterminate until we fix the center (zero-point) of the scale and
the unit size. Most frequently, in a reference population (perhaps
represented by the norming sample) the estimated mean of θ is set
to 0 and the estimated standard deviation of θ is set to 1. θ can then
be interpreted similarly to a z-score. Because θ is not a linear
function of the number-correct (raw or observed) score, θ will
not be exactly the same as the z-score on the number-correct
metric, but the interpretation is similar. When an examinee is 1
standard deviation above the mean in the reference population,
the examinee’s θ = 1. There is no assumption that the scores are
normally distributed, so the θ does not necessarily translate to a
percentile score in the normal distribution. Values for θ theoretically range from –∞ to ∞; most examinees will have values between –3 and 3.

Further:

One additional note on data requirements: IRT models in no
way assume a normal distribution. The estimation procedures do
not require any assumption of normally distributed examinee
ability or normally distributed item parameters. The standard
errors of the estimated and item parameters are asymptotically
normal, but the examinees and item parameters do not need to
follow a normal distribution. Severely non-normal distributions
may present some practical problems to estimation, as discussed
earlier, but this is not due to model assumptions.

So I guess my question could be re-worded as follows:

I would like to report theta estimates in addition to converting them to the equivalent z-score percentiles, primarily because percentiles make sense to people. If I were to do this, would my doing so be considered misleading or otherwise inappropriate?

Best Answer

Yes, assuming your IRT model is estimated using marginal maximum likelihood (MML), which is typically when $\mu_\theta$ = 0 and $\sigma_\theta$ = 1 are used to identify the model, IRT $\theta$ estimates (i.e., $\hat\theta$) may be converted to percentiles.

I also do not believe I have encountered many IRT references mentioning converting $\hat\theta$ to percentiles. One possible reason for this may be that the practice of converting $\hat\theta$ to percentiles would most likely take place in operational scenarios (e.g., ACT and SAT type stuff), and most of this work is not published.

If you decide to convert $\hat\theta$ to percentiles, I would also suggest you report their uncertainty. For example, for each percentile you calculate, you could also report the percentile for $-2*SE(\hat\theta)$ and the percentile for $+2*SE(\hat\theta)$. By doing this, you could obtain each percentile's 95% confidence region. All tests have regions where $\theta$ is measured more reliably than others (Note, this can be inspected by looking at the test information curve). Calculating percentile confidence regions allows uncertainty due to measurement error to be incorporated when interpreting percentiles.

Related Question