The ICPHREG Procedure

EM Algorithm and Extensions

The expectation-maximization (EM) algorithm, as described in Wang et al. (2016) and Zeng, Mao, and Lin (2016), can be used to fit certain types of proportional hazards models to interval-censored data.

Suppose that the observations to be analyzed consist of interval-censored outcomes script upper D equals StartSet left-bracket upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket semicolon bold upper Z Subscript i Baseline EndSet, i equals 1 comma ellipsis comma n, where n is the number of subjects. bold upper Z Subscript i denotes a p-dimensional vector of covariates for the ith subject.

Assuming that there is no exact observation (upper L Subscript i Baseline equals upper R Subscript i), the full likelihood function is

StartLayout 1st Row 1st Column normal upper L left-parenthesis bold-italic theta right-parenthesis 2nd Column equals 3rd Column product Underscript i equals 1 Overscript n Endscripts left-bracket upper S left-parenthesis upper L Subscript i Baseline semicolon bold upper Z Subscript i Baseline right-parenthesis minus upper S left-parenthesis upper R Subscript i Baseline semicolon bold upper Z Subscript i Baseline right-parenthesis right-bracket 2nd Row 1st Column Blank 2nd Column equals 3rd Column product Underscript i equals 1 Overscript n Endscripts left-bracket 1 minus upper S left-parenthesis upper R Subscript i Baseline semicolon bold upper Z Subscript i Baseline right-parenthesis right-bracket Superscript normal upper Delta Super Subscript i Baseline 1 Baseline left-bracket upper S left-parenthesis upper L Subscript i Baseline semicolon bold upper Z Subscript i Baseline right-parenthesis minus upper S left-parenthesis upper R Subscript i Baseline semicolon bold upper Z Subscript i Baseline right-parenthesis right-bracket Superscript normal upper Delta Super Subscript i Baseline 2 Baseline left-bracket upper S left-parenthesis upper L Subscript i Baseline semicolon bold upper Z Subscript i Baseline right-parenthesis right-bracket Superscript normal upper Delta Super Subscript i Baseline 3 EndLayout

where normal upper Delta Subscript i Baseline 1 indicates whether the ith subject is left-censored (upper L Subscript i Baseline equals 0), normal upper Delta Subscript i Baseline 2 indicates whether the ith subject is interval-censored (0 less-than upper L Subscript i Baseline less-than upper R Subscript i Baseline less-than normal infinity), and normal upper Delta Subscript i Baseline 3 indicates whether the ith subject is right-censored (upper R Subscript i Baseline equals normal infinity).

Assume that the baseline hazard function is of the following form,

normal upper Lamda 0 left-parenthesis t right-parenthesis equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts gamma Subscript k Baseline b Subscript k Baseline left-parenthesis t right-parenthesis

where b Subscript k Baseline left-parenthesis t right-parenthesis are known functions that are nondecreasing and nonnegative, and gamma Subscript k are nonnegative baseline parameters.

Let StartSet upper W Subscript i k Baseline colon i equals 1 comma ellipsis comma n semicolon k equals 1 comma ellipsis comma upper K EndSet be a set of latent variables that follow Poisson distributions with means gamma Subscript k Baseline b Subscript k Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta right-parenthesis. Let StartSet upper U Subscript i k Baseline colon i equals 1 comma ellipsis comma n semicolon k equals 1 comma ellipsis comma upper K EndSet be a set of latent variables that follow Poisson distributions with means gamma Subscript k Baseline left-parenthesis b Subscript k Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus b Subscript k Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-parenthesis exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta right-parenthesis. Define upper W Subscript i Baseline equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts upper W Subscript i k and upper U Subscript i Baseline equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts upper U Subscript i k.

The full likelihood can be rewritten as

normal upper L left-parenthesis bold-italic theta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts upper P left-parenthesis upper W Subscript i Baseline greater-than 0 right-parenthesis Superscript normal upper Delta Super Subscript i Baseline 1 Baseline upper P left-parenthesis upper W Subscript i Baseline equals 0 comma upper U Subscript i Baseline greater-than 0 right-parenthesis Superscript normal upper Delta Super Subscript i Baseline 2 Baseline upper P left-parenthesis upper W Subscript i Baseline equals 0 comma upper U Subscript i Baseline equals 0 right-parenthesis Superscript normal upper Delta Super Subscript i Baseline 3

The complete-data likelihood is

normal upper L Subscript c Baseline left-parenthesis bold-italic theta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts product Underscript k equals 1 Overscript upper K Endscripts f Subscript upper W Sub Subscript i k Baseline left-parenthesis upper W Subscript i k Baseline right-parenthesis f Subscript upper U Sub Subscript i k Baseline left-parenthesis upper U Subscript i k Baseline right-parenthesis Superscript normal upper Delta Super Subscript i Baseline 2 Superscript plus normal upper Delta Super Subscript i Baseline 3

where f Subscript upper V Baseline left-parenthesis dot right-parenthesis denotes the Poisson probability mass function for the variable V. It is straightforward to verify that the integration of normal upper L Subscript c Baseline left-parenthesis bold-italic theta right-parenthesis with respect to latent variables leads to the full likelihood normal upper L left-parenthesis bold-italic theta right-parenthesis.

The EM algorithm proceeds as follows. Let the current parameter estimates be bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline equals left-parenthesis bold-italic beta Superscript left-parenthesis d right-parenthesis Baseline comma bold-italic gamma Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis prime. Define

StartLayout 1st Row 1st Column w Subscript i k 2nd Column equals 3rd Column upper E left-parenthesis upper W Subscript i j Baseline vertical-bar script upper D comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis 2nd Row 1st Column u Subscript i k 2nd Column equals 3rd Column upper E left-parenthesis upper U Subscript i j Baseline vertical-bar script upper D comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis 3rd Row 1st Column w Subscript i 2nd Column equals 3rd Column upper E left-parenthesis upper W Subscript i Baseline vertical-bar script upper D comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis 4th Row 1st Column u Subscript i 2nd Column equals 3rd Column upper E left-parenthesis upper U Subscript i Baseline vertical-bar script upper D comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis EndLayout

The expected complete-data log likelihood normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis equals upper E left-bracket log left-parenthesis normal upper L Subscript c Baseline left-parenthesis bold-italic theta right-parenthesis right-parenthesis vertical-bar script upper D comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-bracket is computed as

StartLayout 1st Row 1st Column normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript k equals 1 Overscript upper K Endscripts left-brace left-bracket w Subscript i k Baseline plus left-parenthesis normal upper Delta Subscript i Baseline 2 Baseline plus normal upper Delta Subscript i Baseline 3 Baseline right-parenthesis u Subscript i k Baseline right-bracket left-bracket log left-parenthesis gamma Subscript k Baseline right-parenthesis plus bold upper Z prime Subscript i Baseline bold-italic beta right-bracket 2nd Row 1st Column Blank 2nd Column Blank 3rd Column minus gamma Subscript k Baseline exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta right-parenthesis left-bracket left-parenthesis normal upper Delta Subscript i Baseline 2 Baseline plus normal upper Delta Subscript i Baseline 1 Baseline right-parenthesis b Subscript k Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis plus normal upper Delta Subscript i Baseline 3 Baseline b Subscript k Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket right-brace plus upper B left-parenthesis bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis EndLayout

where upper B left-parenthesis bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis is a constant.

The quantities w Subscript i k and u Subscript i k are computed as follows,

w Subscript i k Baseline equals StartFraction gamma Subscript k Superscript left-parenthesis d right-parenthesis Baseline b Subscript k Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis w Subscript i Baseline Over normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis EndFraction
u Subscript i k Baseline equals StartFraction gamma Subscript k Superscript left-parenthesis d right-parenthesis Baseline left-bracket b Subscript k Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus b Subscript k Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket u Subscript i Baseline Over normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis EndFraction

where

w Subscript i Baseline equals StartFraction normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis normal upper Delta Subscript i Baseline 1 Baseline Over 1 minus exp left-bracket normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis right-bracket EndFraction
u Subscript i Baseline equals StartFraction left-bracket normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis normal upper Delta Subscript i Baseline 2 Baseline Over 1 minus exp left-brace left-bracket normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis right-brace EndFraction

Solve partial-differential normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis slash partial-differential gamma Subscript k Baseline equals 0 for k equals 1 comma ellipsis comma upper K. It follows that the gamma Subscript k Superscript left-parenthesis d right-parenthesis can be updated as

gamma Subscript k Superscript left-parenthesis d plus 1 right-parenthesis Baseline equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis z Subscript i k Baseline plus normal upper Delta Subscript i Baseline 2 Baseline w Subscript i k Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts left-bracket left-parenthesis normal upper Delta Subscript i Baseline 1 Baseline plus normal upper Delta Subscript i Baseline 2 Baseline right-parenthesis b Subscript k Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis plus normal upper Delta Subscript i Baseline 3 Baseline b Subscript k Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta right-parenthesis EndFraction

The partial derivative of normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis with respect to bold-italic beta is

StartFraction partial-differential normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis Over partial-differential bold-italic beta EndFraction equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartSet left-parenthesis w Subscript i Baseline plus left-parenthesis normal upper Delta Subscript i Baseline 2 Baseline plus normal upper Delta Subscript i Baseline 3 Baseline right-parenthesis u Subscript i Baseline right-parenthesis minus left-bracket left-parenthesis normal upper Delta Subscript i Baseline 1 Baseline plus normal upper Delta Subscript i Baseline 2 Baseline right-parenthesis normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis plus normal upper Delta Subscript i Baseline 3 Baseline normal upper Lamda 0 Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket exp left-parenthesis bold upper Z prime Subscript i Baseline bold-italic beta right-parenthesis EndSet bold upper Z Subscript i

After plugging in StartSet gamma Subscript k Superscript left-parenthesis d plus 1 right-parenthesis Baseline comma k equals 1 comma ellipsis comma upper K EndSet, you can update the parameters bold-italic beta by using the one-step Newton-Raphson method (Zeng, Mao, and Lin 2016).

The EM algorithm alternates between updating bold-italic gamma and updating bold-italic beta until convergence.

You can use the EM algorithm to fit the semiparametric model and the piecewise constant hazard model in PROC ICPHREG. The option is NLOPTIONS(TECH=EM) in the PROC ICPHREG statement.

Semiparametric Model and Time-Dependent Covariates

A typical way that interval-censored data are generated is through a process of repeated assessments. Suppose that upper V 1 less-than upper V 2 less-than midline-horizontal-ellipsis less-than upper V Subscript upper M are a random sequence of assessment times. Denote upper V overTilde equals left-parenthesis upper V 0 equals 0 comma upper V 1 comma upper V 2 comma ellipsis comma upper V Subscript upper M Baseline comma upper V Subscript upper M plus 1 Baseline equals normal infinity right-parenthesis and upper D overTilde equals left-parenthesis upper D 0 equals 0 comma upper D 1 comma upper D 2 comma ellipsis comma upper D Subscript upper M Baseline right-parenthesis, where upper D Subscript m Baseline equals upper I left-parenthesis upper V Subscript m minus 1 Baseline less-than upper T less-than upper V Subscript m Baseline right-parenthesis comma m equals 1 comma ellipsis comma upper M.

For the ith subject, i equals 1 comma ellipsis comma n, let upper K Subscript i, upper T Subscript i, upper V overTilde Subscript i Baseline equals left-parenthesis upper V Subscript i Baseline 0 Baseline comma upper V Subscript i Baseline 1 Baseline comma ellipsis comma upper V Subscript i left-parenthesis upper K Sub Subscript i Subscript plus 1 right-parenthesis Baseline right-parenthesis, upper D overTilde Subscript i Baseline equals left-parenthesis upper D Subscript i Baseline 0 Baseline comma upper D Subscript i Baseline 1 Baseline comma ellipsis comma upper D Subscript i upper K Sub Subscript i Subscript Baseline right-parenthesis, and bold upper Z Subscript i Baseline left-parenthesis dot right-parenthesis be the number of assessments, event time, assessment time vector, the indicator vector, and time-dependent covariate process, respectively. Suppose that upper T Subscript i is interval-censored between two assessment times, upper L Subscript i and upper R Subscript i, where upper L Subscript i Baseline equals max Underscript upper V Subscript j Baseline Endscripts left-brace upper V Subscript j Baseline less-than upper T Subscript i Baseline comma j equals 0 comma ellipsis comma upper K Subscript i Baseline right-brace and upper R Subscript i Baseline equals min Underscript upper V Subscript j Baseline Endscripts left-brace upper V Subscript j Baseline less-than upper T Subscript i Baseline comma j equals 1 comma ellipsis comma upper K Subscript i Baseline plus 1 right-brace. Let s 1 less-than midline-horizontal-ellipsis less-than s Subscript upper J be the sorted right boundaries of the Turnbull intervals for StartSet left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket colon i equals 1 comma ellipsis comma n EndSet.

Suppose that the time-dependent covariates process bold upper Z Subscript i Baseline left-parenthesis dot right-parenthesis change value only at assessment times. Let left-parenthesis bold upper Z Subscript i Baseline 1 Baseline comma bold upper Z Subscript i Baseline 2 Baseline comma ellipsis comma bold upper Z Subscript i left-parenthesis upper J plus 1 right-parenthesis Baseline right-parenthesis be the observed covariate vectors at times left-parenthesis 0 comma s 1 comma ellipsis comma s Subscript upper J Baseline right-parenthesis.

Under the semiparametric model, the baseline cumulative hazard function is

normal upper Lamda 0 left-parenthesis t right-parenthesis equals sigma-summation Underscript j colon s Subscript j Baseline less-than t Endscripts gamma Subscript j Baseline comma j equals 1 comma ellipsis comma upper J

For the ith subject, the cumulative hazard function is computed as

normal upper Lamda Subscript i Baseline left-parenthesis t right-parenthesis equals sigma-summation Underscript j colon s Subscript j Baseline less-than t Endscripts gamma Subscript j Baseline exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis

The full likelihood function is

normal upper L left-parenthesis bold-italic theta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts StartSet 1 minus exp left-bracket minus normal upper Lamda Subscript i Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis right-bracket EndSet Superscript normal upper Delta Super Subscript i Baseline 1 Baseline StartSet exp left-bracket minus normal upper Lamda Subscript i Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket minus exp left-bracket minus normal upper Lamda Subscript i Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis right-bracket EndSet Superscript normal upper Delta Super Subscript i Baseline 2 Baseline StartSet exp left-bracket minus normal upper Lamda Subscript i Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket EndSet Superscript normal upper Delta Super Subscript i Baseline 3

where normal upper Delta Subscript i Baseline 1 indicates whether the ith subject is left-censored (upper L Subscript i Baseline equals 0), normal upper Delta Subscript i Baseline 2 indicates whether the ith subject is interval-censored (0 less-than upper L Subscript i Baseline less-than upper R Subscript i Baseline less-than normal infinity), and normal upper Delta Subscript i Baseline 3 indicates whether the ith subject is right-censored (upper R Subscript i Baseline equals normal infinity).

As the following derivation shows, the EM algorithm can be adapted straightforwardly to fit the semiparametric model that contains time-dependent covariates.

Let b Subscript j Baseline left-parenthesis t right-parenthesis equals upper I left-parenthesis s Subscript j Baseline less-than-or-equal-to t right-parenthesis, and redefine the latent Poisson variables as

upper E left-parenthesis upper W Subscript i j Baseline right-parenthesis equals gamma Subscript j Baseline exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis b Subscript j Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis equals gamma Subscript j Baseline exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis upper I left-parenthesis s Subscript j Baseline less-than upper R Subscript i Baseline right-parenthesis
upper E left-parenthesis upper U Subscript i j Baseline right-parenthesis equals gamma Subscript j Baseline exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis left-bracket b Subscript j Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus b Subscript j Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket equals gamma Subscript j Baseline exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis upper I left-parenthesis upper L Subscript i Baseline less-than s Subscript j Baseline less-than upper R Subscript i Baseline right-parenthesis

The expected complete-data log likelihood normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis equals upper E left-bracket log left-parenthesis normal upper L Subscript c Baseline left-parenthesis bold-italic theta right-parenthesis right-parenthesis vertical-bar script upper D comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-bracket becomes

StartLayout 1st Row 1st Column normal upper Q left-parenthesis bold-italic theta comma bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript k equals 1 Overscript upper J Endscripts left-brace left-bracket w Subscript i k Baseline plus left-parenthesis normal upper Delta Subscript i Baseline 2 Baseline plus normal upper Delta Subscript i Baseline 3 Baseline right-parenthesis u Subscript i k Baseline right-bracket left-bracket log left-parenthesis gamma Subscript k Baseline right-parenthesis plus bold upper Z prime Subscript i j Baseline bold-italic beta right-bracket 2nd Row 1st Column Blank 2nd Column Blank 3rd Column minus gamma Subscript k Baseline exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis left-bracket left-parenthesis normal upper Delta Subscript i Baseline 2 Baseline plus normal upper Delta Subscript i Baseline 1 Baseline right-parenthesis b Subscript j Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis plus normal upper Delta Subscript i Baseline 3 Baseline b Subscript j Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket right-brace plus upper B left-parenthesis bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis EndLayout

where upper B left-parenthesis bold-italic theta Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis is a constant and w Subscript i k and u Subscript i k are computed as follows:

w Subscript i k Baseline equals StartFraction gamma Subscript k Superscript left-parenthesis d right-parenthesis Baseline b Subscript j Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis w Subscript i Baseline Over normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis EndFraction
u Subscript i k Baseline equals StartFraction gamma Subscript k Superscript left-parenthesis d right-parenthesis Baseline left-bracket b Subscript j Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus b Subscript j Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket exp left-parenthesis bold upper Z prime Subscript i j Baseline bold-italic beta right-parenthesis u Subscript i Baseline Over normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis EndFraction
w Subscript i Baseline equals StartFraction normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis normal upper Delta Subscript i Baseline 1 Baseline Over 1 minus exp left-bracket normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis right-bracket EndFraction
u Subscript i Baseline equals StartFraction left-bracket normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket normal upper Delta Subscript i Baseline 2 Baseline Over 1 minus exp left-brace left-bracket normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper R Subscript i Baseline right-parenthesis minus normal upper Lamda Subscript i Superscript left-parenthesis d right-parenthesis Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket right-brace EndFraction

You use the ID statement to fit the semiparametric model that contains time-dependent covariates. The levels of the ID variable identify the subjects to be analyzed.

Variance Estimation

Louis’s Method

Let ModifyingAbove bold-italic theta With caret equals left-parenthesis ModifyingAbove bold-italic beta With caret comma ModifyingAbove bold-italic gamma With caret right-parenthesis be the maximum likelihood estimates as found by the EM algorithm. Under suitable conditions, you can apply Louis’s method (Louis 1982) to obtain the covariance matrix of ModifyingAbove bold-italic theta With caret.

The observed information matrix is computed as

upper I left-parenthesis ModifyingAbove bold-italic theta With caret right-parenthesis equals minus StartFraction partial-differential squared normal upper Q left-parenthesis bold-italic theta comma ModifyingAbove bold-italic theta With caret right-parenthesis Over partial-differential bold-italic theta partial-differential bold-italic theta prime EndFraction minus normal c normal o normal v StartSet StartFraction partial-differential log normal upper L Subscript c Baseline left-parenthesis bold-italic theta right-parenthesis Over partial-differential bold-italic theta EndFraction vertical-bar Subscript bold-italic theta equals ModifyingAbove bold-italic theta With caret Baseline EndSet

and its inverse upper I Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic theta With caret right-parenthesis is the estimated covariance of ModifyingAbove bold-italic theta With caret.

Louis’s method is the default method of calculating standard errors for the semiparametric model.

Profile Likelihood Method

You can use the profile likelihood method of Murphy and Van Der Vaart (2000) to estimate the covariance matrix of ModifyingAbove bold-italic beta With caret. The profile log-likelihood function is defined as

normal upper P normal upper L left-parenthesis bold-italic beta right-parenthesis equals max Underscript bold-italic gamma element-of script left-parenthesis upper D right-parenthesis Endscripts log normal upper L left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis

where script upper D is the parameter space of bold-italic gamma.

The Hessian matrix of normal upper P normal upper L left-parenthesis bold-italic beta right-parenthesis can be computed using numerical differentiation. Let bold e Subscript k be the identity vector for k equals 1 comma ellipsis comma p, and let l be a small perturbation. The left-parenthesis j comma k right-parenthesisth element of the Hessian matrix can be approximated by

upper H Subscript i j Baseline equals StartFraction normal upper P normal upper L left-parenthesis bold-italic beta right-parenthesis minus normal upper P normal upper L left-parenthesis bold-italic beta plus l dot bold e Subscript j Baseline right-parenthesis minus normal upper P normal upper L left-parenthesis bold-italic beta plus l dot bold e Subscript k Baseline right-parenthesis minus normal upper P normal upper L left-parenthesis bold-italic beta plus l dot bold e Subscript j Baseline plus l dot bold e Subscript k Baseline right-parenthesis Over l squared EndFraction

The covariance matrix of ModifyingAbove bold-italic beta With caret is estimated by inverting the negative of the Hessian matrix.

You can use the profile likelihood method for the semiparametric model by specifying the PLVARIANCE option in the MODEL statement. But be aware that this computation is iterative and can consume a relatively large amount of CPU time.

EMICM Algorithm

Pan (1999) proposes using the iterative convex minorant (ICM) algorithm to fit semiparametric proportional hazards models to interval-censored data.

Define alpha Subscript j Baseline equals sigma-summation Underscript k equals 1 Overscript j Endscripts gamma Subscript k. Denote alpha 0 equals 0 and bold-italic alpha equals left-parenthesis alpha 1 comma ellipsis comma alpha Subscript upper J minus 1 Baseline right-parenthesis prime. The full likelihood function can be rewritten in terms of bold-italic alpha and the regression coefficients bold-italic beta.

Maximizing the likelihood with respect to bold-italic theta equals left-parenthesis bold-italic gamma comma bold-italic beta right-parenthesis is equivalent to maximizing it with respect to left-parenthesis bold-italic alpha comma bold-italic beta right-parenthesis. Because the alpha Subscript j are naturally ordered, the optimization is subject to the following constraint:

upper C equals StartSet bold x equals left-parenthesis alpha 1 comma ellipsis comma alpha Subscript upper J minus 1 Baseline right-parenthesis colon 0 less-than-or-equal-to alpha 1 less-than-or-equal-to midline-horizontal-ellipsis less-than-or-equal-to alpha Subscript upper J minus 1 Baseline less-than-or-equal-to 1 EndSet

Denote the log-likelihood function as l left-parenthesis bold-italic alpha comma bold-italic beta right-parenthesis. Because the regression coefficients bold-italic beta are not constrained, you can update them by using the one-step Newton-Raphson method as in the EM algorithm. Pan (1999) suggests using the ICM algorithm to update the baseline parameters bold-italic alpha; doing so essentially treats bold-italic beta as fixed and maximizes the function l left-parenthesis bold-italic alpha right-parenthesis equals l left-parenthesis bold-italic alpha vertical-bar bold-italic beta right-parenthesis. Suppose that the maximum of l left-parenthesis bold-italic alpha right-parenthesis occurs at ModifyingAbove bold-italic alpha With caret. Mathematically, it can be proved that ModifyingAbove bold-italic alpha With caret equals the maximizer of the following quadratic function,

g Superscript asterisk Baseline left-parenthesis bold x vertical-bar bold y comma bold upper W right-parenthesis equals minus one-half left-parenthesis bold x minus bold y right-parenthesis prime bold upper W left-parenthesis bold x minus bold y right-parenthesis

where bold y equals ModifyingAbove bold-italic alpha With caret plus bold upper W Superscript negative 1 Baseline nabla l left-parenthesis ModifyingAbove bold-italic alpha With caret right-parenthesis, nabla l left-parenthesis dot right-parenthesis denotes the derivatives of l left-parenthesis dot right-parenthesis with respect to bold-italic alpha, and bold upper W is a positive definite matrix of size left-parenthesis upper J minus 1 right-parenthesis times left-parenthesis upper J minus 1 right-parenthesis (Groeneboom and Wellner 1992).

The ICM algorithm updates bold-italic alpha as follows. For the dth iteration, the algorithm updates the quantity

bold y Superscript left-parenthesis d right-parenthesis Baseline equals ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d minus 1 right-parenthesis Baseline minus bold upper W Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d minus 1 right-parenthesis Baseline right-parenthesis nabla l left-parenthesis ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d minus 1 right-parenthesis Baseline right-parenthesis

where ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d minus 1 right-parenthesis is the parameter estimate from the previous iteration and bold upper W left-parenthesis ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d minus 1 right-parenthesis Baseline right-parenthesis equals normal d normal i normal a normal g left-parenthesis w Subscript j Baseline comma j equals 1 comma ellipsis comma upper J minus 1 right-parenthesis is a positive definite diagonal matrix that depends on ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis l minus 1 right-parenthesis. A convenient choice for bold upper W left-parenthesis bold-italic alpha right-parenthesis is the negative of the second-order derivative of the log-likelihood function l left-parenthesis bold-italic alpha right-parenthesis:

w Subscript j Baseline equals w Subscript j Baseline left-parenthesis bold-italic alpha right-parenthesis equals minus StartFraction partial-differential squared Over partial-differential alpha Subscript j Superscript 2 Baseline EndFraction l left-parenthesis bold-italic alpha right-parenthesis

Given bold y equals bold y Superscript left-parenthesis d right-parenthesis Baseline equals left-parenthesis y 1 Superscript left-parenthesis d right-parenthesis Baseline comma ellipsis comma y Subscript upper J minus 1 Superscript left-parenthesis d right-parenthesis Baseline right-parenthesis prime and bold upper W equals bold upper W left-parenthesis ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d minus 1 right-parenthesis Baseline right-parenthesis, the parameter estimate ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d right-parenthesis maximizes the quadratic function g Superscript asterisk Baseline left-parenthesis bold x vertical-bar bold y comma bold upper W right-parenthesis.

Define the cumulative sum diagram StartSet upper P Subscript k Baseline comma k equals 0 comma ellipsis comma upper J minus 1 EndSet as a set of m points in the plane, where upper P 0 equals left-parenthesis 0 comma 0 right-parenthesis and

upper P Subscript k Baseline equals left-parenthesis sigma-summation Underscript i equals 1 Overscript k Endscripts w Subscript i Baseline comma sigma-summation Underscript i equals 1 Overscript k Endscripts w Subscript i Baseline y Subscript i Superscript left-parenthesis l right-parenthesis Baseline right-parenthesis

Technically, ModifyingAbove bold-italic alpha With caret Superscript left-parenthesis d right-parenthesis equals the left derivative of the convex minorant, or in other words, the largest convex function below the diagram StartSet upper P Subscript k Baseline comma k equals 0 comma ellipsis comma upper J minus 1 EndSet. You can solve this optimization problem by using the pool-adjacent-violators algorithm (Groeneboom and Wellner 1992).

The EMICM algorithm combines the EM algorithm and the ICM algorithm by alternating the two different steps in its iterations. Whereas the EM step updates both the baseline parameters and the regression coefficients, the ICM step updates only the baseline parameters. If the ICM step does not increases the likelihood value, the parameter changes are halved for the next iteration. The process repeats a maximum of five times, until an increase in the likelihood value is found.

The EMICM algorithm is the default method of fitting the semiparametric model. You can use it to fit the piecewise constant hazard model by specifying the NLOPTIONS(TECH=EMICM) option in the PROC ICPHREG statement.

Last updated: March 08, 2022