Jump to content

Item response theory

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Amead (talk | contribs) at 19:03, 5 January 2004 (changed npov mis-edit back, removed 'fallible'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Item response theory designates a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of item response theory is to understand and improve the reliability of psychological tests.

Item response theory is very often referred to by its acronym, IRT. IRT may be regarded as roughly synonymous with latent trait theory. It is sometimes referred to using the word strong as in strong true score theory or modern as in modern mental test theory because IRT is a more recent body of theory and makes stronger assumptions as compared to classical test theory.

Much of the literature on IRT revolves around item response models. These models relate a person parameter (or, in the case of multidimensional item response theory, a vector of person parameters) to one or more item parameters. For example:

where is the person parameter and , , and are item parameters. This logistic model relates the level of the person parameter and item parameters to the propbability of responding correctly. The constant D has the value 1.702 which rescales the logistic function to closely approximate the cumulative normal ogive. (This model was originally developed using the normal ogive but the logistic model with the recaling provides virtually the same model while simplifying the computations greatly.)

The line that traces the probability for a given item across levels of the trait is called the item characteristic curve (ICC) or, less commony, item response function.

The person parameter, also called the latent trait, is the human capacity measured by the test. It might be a cognitive ability, physical ability, skill, knowledge level, attitude, personality characteristic, etc. In a unidimensional model such as the one above, this trait is considered to be a single factor (as in factor analysis). Individual items or individuals might have secondary factors but these are assumed to be mutually independent and collectively othogonal.

The item parameters simply determine the shape of the ICC and in some cases may not have a direct interpretation. In this case, however, the parameters are commonly interpreted as follows. The b parameter is considered to index an item's difficulty. Note that this model scales the items's difficulty and the person's trait onto the same metric. Thus is is valid to talk about an item being about as hard as Person A's trait level or of a person's trait level being about the same as Item Y's difficulty. The a parameter controls how steeply the ICC rises and thus indicates the degree to which the item distinguishes individuals with trait levels above and below the rising slope of the ICC. This parameter is thus called the item discrimination and is correlated with the item's loading on the underlying factor, with the item-total correlation, and with the index of discrimination. The final parameter, c, is the asympotote of the ICC on the left-hand side. Thus it indicates the probability that very low ability individuals will get this item correct by chance.

This model assumes a single trait dimension and a binary outcome; it is a dichotomous, unidimensional model. Another class of models preduct polytomous outcomes. And a class of models exist to predict response data that arise from multiple traits.


It is worth noting the implications of IRT for test-takers. Tests imprecise tools and the score acheived by an individual (the observed score) is always the true score occluded by some degree of error. This error may push the observed score higher or lower.

It is also worth noting that nothing about these models refutes human development or improvement. A person may learn skills, knowledge or even so called "test-taking skills" which may translate to a higher true-score.

See also psychometrics, standardized test.

Some examples of psychometric tests are found below: