The IRT Procedure

Notation for the Item Response Theory Model

This section introduces the mathematical notation that is used throughout the chapter to describe the item response theory (IRT) model. For a description of the fitting algorithms and the mathematical-statistical details, see the section Maximizing the Marginal Likelihood.

Models for Ordinal Response Items

A d-dimensional graded response IRT model that has K ordinal responses can be expressed by the equations

normal y Subscript i j Baseline equals bold-italic lamda Subscript j Baseline bold-italic eta Subscript i Baseline plus epsilon Subscript i j
p Subscript i j k Baseline equals probability left-parenthesis normal u Subscript i j Baseline equals k right-parenthesis equals probability left-parenthesis alpha Subscript left-parenthesis j comma k minus 1 right-parenthesis Baseline less-than normal y Subscript i j Baseline less-than alpha Subscript left-parenthesis j comma k right-parenthesis Baseline right-parenthesis comma k equals 1 comma ellipsis comma upper K

where normal u Subscript i j is the observed ordinal response from subject i for item j, normal y Subscript i j is a continuous latent response that underlies normal u Subscript i j, bold-italic alpha Subscript j Baseline equals left-parenthesis alpha Subscript left-parenthesis j comma 0 right-parenthesis Baseline equals negative normal infinity comma alpha Subscript left-parenthesis j comma 1 right-parenthesis Baseline comma ellipsis comma alpha Subscript left-parenthesis j comma upper K minus 1 right-parenthesis Baseline comma alpha Subscript left-parenthesis j comma upper K right-parenthesis Baseline equals normal infinity right-parenthesis is a vector of threshold parameters for item j, bold-italic lamda Subscript j is a vector of slope (discrimination) parameters for item j, bold-italic eta Subscript i Baseline equals left-parenthesis eta Subscript i Baseline 1 Baseline comma ellipsis comma eta Subscript i d Baseline right-parenthesis is a vector of latent factors for subject i, bold-italic eta Subscript i Baseline tilde upper N Subscript d Baseline left-parenthesis bold-italic mu comma bold upper Sigma right-parenthesis, and bold-italic epsilon Subscript i Baseline equals left-parenthesis epsilon Subscript i Baseline 1 Baseline comma ellipsis comma epsilon Subscript i upper J Baseline right-parenthesis is a vector of unique factors for subject i. All the unique factors in bold-italic epsilon Subscript i are independent from one another, suggesting that normal y Subscript i j Baseline comma j equals 1 comma ellipsis comma upper J, are independent conditional on the latent factor bold-italic eta Subscript i. This is the so-called local independence assumption. Finally, bold-italic eta Subscript i and bold-italic epsilon Subscript i are also independent.

Based on the preceding model specification,

p Subscript i j k Baseline equals integral Subscript alpha Subscript left-parenthesis j comma k minus 1 right-parenthesis Baseline Superscript alpha Subscript left-parenthesis j comma k right-parenthesis Baseline Baseline p left-parenthesis normal y semicolon bold-italic lamda Subscript j Baseline bold-italic eta Subscript i Baseline comma 1 right-parenthesis d y equals integral Subscript alpha Subscript left-parenthesis j comma k minus 1 right-parenthesis Baseline minus lamda Subscript j Baseline eta Subscript i Baseline Superscript alpha Subscript left-parenthesis j comma k right-parenthesis Baseline minus bold-italic lamda Subscript j Baseline bold-italic eta Subscript i Baseline Baseline p left-parenthesis normal y semicolon 0 comma 1 right-parenthesis d normal y

where p is determined by the link function. It is the density function of the standard normal distribution if the probit link is used, or the density function of the logistic distribution if the logistic link is used.

Let bold upper Lamda equals left-parenthesis bold-italic lamda 1 Superscript upper T Baseline comma ellipsis comma bold-italic lamda Subscript upper J Superscript upper T Baseline right-parenthesis denote the slope matrix. To identify the model in exploratory analysis, the upper triangular elements of bold upper Lamda are fixed as zero, the factor mean bold-italic mu is fixed as a zero vector, and the factor variance covariance matrix bold upper Sigma is fixed as an identity matrix. For confirmatory analysis, it is assumed that the identification problem is solved by user-specified constraints.

The model that is specified in the preceding equation uses the latent response formulation. PROC IRT uses this parameterization for computational convenience. When there is only one latent factor, a mathematically equivalent parameterization for the model is

p Subscript i j k Baseline equals integral Subscript minus a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma k minus 1 Baseline right-parenthesis Superscript minus a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma k Baseline right-parenthesis Baseline p left-parenthesis normal y semicolon 0 comma 1 right-parenthesis d y

where normal a Subscript j is called the slope (discrimination) parameter and normal b Subscript j comma k Baseline comma k equals 1 comma ellipsis comma upper K comma are called the threshold parameters. The threshold parameters under these two parameterizations can be translated as normal b Subscript j comma k Baseline equals StartFraction alpha Subscript j comma k Baseline Over lamda Subscript j Baseline EndFraction, where k equals 1 comma ellipsis comma upper K and gamma Subscript j comma k Baseline equals minus alpha Subscript j comma k is often called the intercept parameter.

The generalized partial credit (GPC) model is another popular IRT model for ordinal items besides the graded response model. Introduced by Muraki (1992), the GPC model is an extension of the partial credit (PC) model proposed by Masters (1982). In the PC model, the slope (discrimination) parameter is fixed as 1 for all the items. The GPC model releases this assumption by introducing the slope parameter for each item. The GPC model can be formulated as

p Subscript i j k Baseline equals StartFraction exp left-parenthesis sigma-summation Underscript h equals 1 Overscript k Endscripts a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma h Baseline right-parenthesis right-parenthesis Over sigma-summation Underscript k equals 1 Overscript upper K Endscripts exp left-parenthesis sigma-summation Underscript h equals 1 Overscript k Endscripts a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma h Baseline right-parenthesis right-parenthesis EndFraction

In this formulation, a Subscript j is called the slope (discrimination) parameter and b Subscript j comma h is called the step parameter. For identification purposes, the first category of each item is treated as the reference level—that is, b Subscript j comma 1 is fixed to zero for each item j. These fixed parameter values are omitted from the output.

The GPC model applies only to unidimensional analysis.

Models for Binary Response Items

Popular models for binary response items include one-, two-, three-, and four-parameter models and the Rasch model. When the responses are binary, the graded response model introduced in the section Models for Ordinal Response Items reduces to the two-parameter model, which can be expressed as

normal y Subscript i j Baseline equals normal a Subscript j Baseline left-parenthesis bold-italic eta Subscript i Baseline minus normal b Subscript j Baseline right-parenthesis plus epsilon Subscript i j
p Subscript i j Baseline equals probability left-parenthesis u Subscript i j Baseline equals 1 right-parenthesis equals probability left-parenthesis normal y Subscript i j Baseline greater-than 0 right-parenthesis

where normal b Subscript j is often called the item difficulty parameter.

The two-parameter model reduces to a one-parameter model when slope parameters for all the items are constrained to be equal. When the logistic link is used, the one- and two-parameter models are often abbreviated as 1PL and 2PL. When all the slope parameters are set to 1 and the factor variance is set to a free parameter, the Rasch model is obtained.

You can obtain three- and four-parameter models by introducing the guessing and ceiling parameters. Let g Subscript j and c Subscript j denote the item-specific guessing and ceiling parameters, respectively. Then the four-parameter model can be expressed as

p Subscript i j Baseline equals probability left-parenthesis normal u Subscript i j Baseline equals 1 right-parenthesis equals normal g Subscript j Baseline plus left-parenthesis normal c Subscript j Baseline minus normal g Subscript j Baseline right-parenthesis probability left-parenthesis normal y Subscript i j Baseline greater-than 0 right-parenthesis

This model reduces to the three-parameter model when normal c Subscript j Baseline equals 1.

Model for Nominal Response Items

For nominal response items, you can apply the nominal response model to do item analysis. The nominal response model was introduced by Bock (1972) and can be formulated as

p Subscript i j k Baseline equals StartFraction exp left-parenthesis a Subscript j comma k Baseline eta Subscript i Baseline plus b Subscript j comma k Baseline right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper K Endscripts exp left-parenthesis a Subscript j comma h Baseline eta Subscript i Baseline plus b Subscript j comma h Baseline right-parenthesis EndFraction

In this formulation, a Subscript j comma k is the slope parameter and b Subscript j comma k is the intercept parameter. For identification purposes, the first category of each item is treated as the reference level—that is, both a Subscript j comma 1 and b Subscript j comma 1 are fixed to zero for each item j. These fixed parameter values are omitted from the output.

This nominal response model applies only to unidimensional analysis.

Last updated: December 09, 2022