The IRT Procedure

Notation for the Item Response Theory Model

This section introduces the mathematical notation that is used throughout the chapter to describe the item response theory (IRT) model. For a description of the fitting algorithms and the mathematical-statistical details, see the section Maximizing the Marginal Likelihood.

Models for Ordinal Response Items

A d-dimensional graded response IRT model that has K ordinal responses can be expressed by the equations

normal y Subscript i j Baseline equals bold-italic lamda Subscript j Baseline bold-italic eta Subscript i Baseline plus epsilon Subscript i j

p Subscript i j k Baseline equals probability left-parenthesis normal u Subscript i j Baseline equals k right-parenthesis equals probability left-parenthesis alpha Subscript left-parenthesis j comma k minus 1 right-parenthesis Baseline less-than normal y Subscript i j Baseline less-than alpha Subscript left-parenthesis j comma k right-parenthesis Baseline right-parenthesis comma k equals 1 comma ellipsis comma upper K

where is the observed ordinal response from subject i for item j, is a continuous latent response that underlies , is a vector of threshold parameters for item j, is a vector of slope (discrimination) parameters for item j, is a vector of latent factors for subject i, , and is a vector of unique factors for subject i. All the unique factors in are independent from one another, suggesting that , are independent conditional on the latent factor . This is the so-called local independence assumption. Finally, and are also independent.

Based on the preceding model specification,

p Subscript i j k Baseline equals integral Subscript alpha Subscript left-parenthesis j comma k minus 1 right-parenthesis Baseline Superscript alpha Subscript left-parenthesis j comma k right-parenthesis Baseline Baseline p left-parenthesis normal y semicolon bold-italic lamda Subscript j Baseline bold-italic eta Subscript i Baseline comma 1 right-parenthesis d y equals integral Subscript alpha Subscript left-parenthesis j comma k minus 1 right-parenthesis Baseline minus lamda Subscript j Baseline eta Subscript i Baseline Superscript alpha Subscript left-parenthesis j comma k right-parenthesis Baseline minus bold-italic lamda Subscript j Baseline bold-italic eta Subscript i Baseline Baseline p left-parenthesis normal y semicolon 0 comma 1 right-parenthesis d normal y

where p is determined by the link function. It is the density function of the standard normal distribution if the probit link is used, or the density function of the logistic distribution if the logistic link is used.

Let denote the slope matrix. To identify the model in exploratory analysis, the upper triangular elements of are fixed as zero, the factor mean is fixed as a zero vector, and the factor variance covariance matrix is fixed as an identity matrix. For confirmatory analysis, it is assumed that the identification problem is solved by user-specified constraints.

The model that is specified in the preceding equation uses the latent response formulation. PROC IRT uses this parameterization for computational convenience. When there is only one latent factor, a mathematically equivalent parameterization for the model is

p Subscript i j k Baseline equals integral Subscript minus a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma k minus 1 Baseline right-parenthesis Superscript minus a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma k Baseline right-parenthesis Baseline p left-parenthesis normal y semicolon 0 comma 1 right-parenthesis d y

where is called the slope (discrimination) parameter and are called the threshold parameters. The threshold parameters under these two parameterizations can be translated as , where and is often called the intercept parameter.

The generalized partial credit (GPC) model is another popular IRT model for ordinal items besides the graded response model. Introduced by Muraki (1992), the GPC model is an extension of the partial credit (PC) model proposed by Masters (1982). In the PC model, the slope (discrimination) parameter is fixed as 1 for all the items. The GPC model releases this assumption by introducing the slope parameter for each item. The GPC model can be formulated as

p Subscript i j k Baseline equals StartFraction exp left-parenthesis sigma-summation Underscript h equals 1 Overscript k Endscripts a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma h Baseline right-parenthesis right-parenthesis Over sigma-summation Underscript k equals 1 Overscript upper K Endscripts exp left-parenthesis sigma-summation Underscript h equals 1 Overscript k Endscripts a Subscript j Baseline left-parenthesis eta Subscript i Baseline minus b Subscript j comma h Baseline right-parenthesis right-parenthesis EndFraction

In this formulation, is called the slope (discrimination) parameter and is called the step parameter. For identification purposes, the first category of each item is treated as the reference level—that is, is fixed to zero for each item j. These fixed parameter values are omitted from the output.

The GPC model applies only to unidimensional analysis.

Models for Binary Response Items

Popular models for binary response items include one-, two-, three-, and four-parameter models and the Rasch model. When the responses are binary, the graded response model introduced in the section Models for Ordinal Response Items reduces to the two-parameter model, which can be expressed as

normal y Subscript i j Baseline equals normal a Subscript j Baseline left-parenthesis bold-italic eta Subscript i Baseline minus normal b Subscript j Baseline right-parenthesis plus epsilon Subscript i j

p Subscript i j Baseline equals probability left-parenthesis u Subscript i j Baseline equals 1 right-parenthesis equals probability left-parenthesis normal y Subscript i j Baseline greater-than 0 right-parenthesis

where is often called the item difficulty parameter.

The two-parameter model reduces to a one-parameter model when slope parameters for all the items are constrained to be equal. When the logistic link is used, the one- and two-parameter models are often abbreviated as 1PL and 2PL. When all the slope parameters are set to 1 and the factor variance is set to a free parameter, the Rasch model is obtained.

You can obtain three- and four-parameter models by introducing the guessing and ceiling parameters. Let and denote the item-specific guessing and ceiling parameters, respectively. Then the four-parameter model can be expressed as

p Subscript i j Baseline equals probability left-parenthesis normal u Subscript i j Baseline equals 1 right-parenthesis equals normal g Subscript j Baseline plus left-parenthesis normal c Subscript j Baseline minus normal g Subscript j Baseline right-parenthesis probability left-parenthesis normal y Subscript i j Baseline greater-than 0 right-parenthesis

This model reduces to the three-parameter model when .

Model for Nominal Response Items

For nominal response items, you can apply the nominal response model to do item analysis. The nominal response model was introduced by Bock (1972) and can be formulated as

p Subscript i j k Baseline equals StartFraction exp left-parenthesis a Subscript j comma k Baseline eta Subscript i Baseline plus b Subscript j comma k Baseline right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper K Endscripts exp left-parenthesis a Subscript j comma h Baseline eta Subscript i Baseline plus b Subscript j comma h Baseline right-parenthesis EndFraction

In this formulation, is the slope parameter and is the intercept parameter. For identification purposes, the first category of each item is treated as the reference level—that is, both and are fixed to zero for each item j. These fixed parameter values are omitted from the output.

This nominal response model applies only to unidimensional analysis.

Last updated: December 09, 2022