The ICLIFETEST Procedure

Statistical Methods

Nonparametric Estimation of the Survival Function

Suppose the event times for a total of n subjects, upper T 1, upper T 2, …, upper T Subscript n, are independent random variables with an underlying cumulative distribution function upper F left-parenthesis t right-parenthesis. Denote the corresponding survival function as upper S left-parenthesis t right-parenthesis. Interval-censoring occurs when some or all upper T Subscript i’s cannot be observed directly but are known to be within the interval left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket comma upper L Subscript i Baseline less-than-or-equal-to upper R Subscript i Baseline.

The observed intervals might or might not overlap. It they do not overlap, then you can usually use conventional methods for right-censored data, with minor modifications. On the other hand, if some intervals overlap, you need special algorithms to compute an unbiased estimate of the underlying survival function.

To characterize the nonparametric estimate of the survival function, Peto (1973) and Turnbull (1976) show that the estimate can jump only at the right endpoint of a set of nonoverlapping intervals (also known as Turnbull intervals), StartSet upper I Subscript j Baseline equals left-parenthesis q Subscript j Baseline comma p Subscript j Baseline right-bracket comma j equals 1 comma ellipsis comma m EndSet. A simple algorithm for finding these intervals is to order all the boundary values StartSet upper L Subscript i Baseline comma upper R Subscript i Baseline comma i equals 1 comma ellipsis comma n EndSet with labels of L and R attached and then pick up the intervals that have L as the left boundary and R as the right boundary. For example, suppose that the data set contains only three intervals, left-parenthesis 1 comma 3 right-bracket, left-parenthesis 2 comma 4 right-bracket, and left-parenthesis 5 comma 6 right-bracket. The ordered values are 1 left-parenthesis upper L right-parenthesis comma 2 left-parenthesis upper L right-parenthesis comma 3 left-parenthesis upper R right-parenthesis comma 4 left-parenthesis upper R right-parenthesis comma 5 left-parenthesis upper L right-parenthesis comma 6 left-parenthesis upper R right-parenthesis. Then the Turnbull intervals are left-parenthesis 2 comma 3 right-bracket and left-parenthesis 5 comma 6 right-bracket.

For the exact observation upper L Subscript i Baseline equals upper R Subscript i Baseline equals t, Ng (2002) suggests that it be represented by the interval left-parenthesis t minus epsilon comma t right-parenthesis for a positive small value epsilon. If upper R Subscript j Baseline equals t for an observation left-parenthesis upper L Subscript j Baseline comma upper R Subscript j Baseline right-bracket (upper L Subscript j Baseline less-than upper R Subscript j), then the observation is represented by left-parenthesis upper L Subscript j Baseline plus epsilon comma upper R Subscript j Baseline minus epsilon right-parenthesis.

Define theta Subscript j Baseline equals upper P left-parenthesis upper T element-of upper I Subscript j Baseline right-parenthesis, j equals 1 comma ellipsis comma m. Given the data, the survival function, upper S left-parenthesis t right-parenthesis, can be determined only up to equivalence classes t, which are complements of the Turnbull intervals. upper S left-parenthesis t right-parenthesis is undefined if t is within some upper I Subscript j. The likelihood function for bold-italic theta equals StartSet theta Subscript j Baseline comma j equals 1 comma ellipsis comma m EndSet is then

upper L left-parenthesis bold-italic theta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts left-parenthesis sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline theta Subscript j Baseline right-parenthesis

where alpha Subscript i j is 1 if upper I Subscript j is contained in left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket and 0 otherwise.

Denote the maximum likelihood estimate for ModifyingAbove bold-italic theta With caret as ModifyingAbove bold-italic theta With caret equals StartSet ModifyingAbove theta With caret Subscript j Baseline comma j equals 1 comma ellipsis comma m EndSet. The survival function can then be estimated as

ModifyingAbove upper S With caret left-parenthesis t right-parenthesis equals sigma-summation Underscript k colon p Subscript k Baseline greater-than t Endscripts ModifyingAbove theta With caret Subscript k Baseline comma t not-an-element-of normal a normal n normal y upper I Subscript j Baseline comma j equals 1 comma ellipsis comma m
Estimation Algorithms

Peto (1973) suggests maximizing this likelihood function by using a Newton-Raphson algorithm subject to the constraint sigma-summation Underscript i equals j Overscript m Endscripts theta Subscript j Baseline equals 1. This approach has been implemented in the percent-signICE macro. Although feasible, the optimization becomes less stable as the dimension of bold-italic theta increases.

Treating interval-censored data as missing data, Turnbull (1976) derives a self-consistent equation for estimating the theta Subscript j’s:

theta Subscript j Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts mu Subscript i j Baseline left-parenthesis bold-italic theta right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction alpha Subscript i j Baseline theta Subscript j Baseline Over sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline theta Subscript j Baseline EndFraction

where mu Subscript i j Baseline left-parenthesis bold-italic theta right-parenthesis is the expected probability that the event upper T Subscript i occurs within upper I Subscript j for the ith subject, given the observed data.

The algorithm is an expectation-maximization (EM) algorithm in the sense that it iteratively updates bold-italic theta and mu Subscript i j Baseline left-parenthesis bold-italic theta right-parenthesis. Convergence is declared if, for a chosen number epsilon greater-than 0,

sigma-summation Underscript j equals 1 Overscript m Endscripts StartAbsoluteValue ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l right-parenthesis Baseline minus ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l minus 1 right-parenthesis Baseline EndAbsoluteValue less-than epsilon

where ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l right-parenthesis denotes the updated value for theta Subscript j after the lth iteration.

An alternative criterion is to declare convergence when increments of the likelihood are small:

StartAbsoluteValue upper L left-parenthesis ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l right-parenthesis Baseline semicolon j equals 1 comma ellipsis comma m right-parenthesis minus upper L left-parenthesis ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l minus 1 right-parenthesis Baseline semicolon j equals 1 comma ellipsis comma m right-parenthesis EndAbsoluteValue less-than epsilon

There is no guarantee that the converged values constitute a maximum likelihood estimate (MLE). Gentleman and Geyer (1994) introduced the Kuhn-Tucker conditions based on constrained programming as a check of whether the algorithm converges to a legitimate MLE. These conditions state that a sufficient and necessary condition for the estimate to be a MLE is that the Lagrange multipliers gamma Subscript j Baseline equals n minus c Subscript j are nonnegative for all the theta Subscript j’s that are estimated to be zero, where c Subscript j is the derivative of the log-likelihood function with respect to theta Subscript j:

c Subscript j Baseline equals StartFraction partial-differential log left-parenthesis upper L right-parenthesis Over partial-differential theta Subscript j Baseline EndFraction equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction alpha Subscript i j Baseline Over sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline theta Subscript j Baseline EndFraction

You can use Turnbull’s method by specifying METHOD=TURNBULL in the ICLIFETEST statement. The Lagrange multipliers are displayed in the "Nonparametric Survival Estimates" table.

Groeneboom and Wellner (1992) propose using the iterative convex minorant (ICM) algorithm to estimate the underlying survival function as an alternative to Turnbull’s method. Define beta Subscript j Baseline equals upper F left-parenthesis p Subscript j Baseline right-parenthesis, j equals 1 comma ellipsis comma m as the cumulative probability at the right boundary of the jth Turnbull interval: beta Subscript j Baseline equals sigma-summation Underscript k equals 1 Overscript j Endscripts theta Subscript k. It follows that beta Subscript m Baseline equals 1. Denote beta 0 equals 0 and bold-italic beta equals left-parenthesis beta 1 comma ellipsis comma beta Subscript m minus 1 Baseline right-parenthesis prime. You can rewrite the likelihood function as

upper L left-parenthesis bold-italic beta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline left-parenthesis beta Subscript j Baseline minus beta Subscript j minus 1 Baseline right-parenthesis

Maximizing the likelihood with respect to the theta Subscript j’s is equivalent to maximizing it with respect to the beta Subscript j’s. Because the beta Subscript j’s are naturally ordered, the optimization is subject to the following constraint:

upper C equals StartSet bold x equals left-parenthesis beta 1 comma ellipsis comma beta Subscript m minus 1 Baseline right-parenthesis colon 0 less-than-or-equal-to beta 1 less-than-or-equal-to midline-horizontal-ellipsis less-than-or-equal-to beta Subscript m minus 1 Baseline less-than-or-equal-to 1 EndSet

Denote the log-likelihood function as l left-parenthesis bold-italic beta right-parenthesis. Suppose its maximum occurs at ModifyingAbove bold-italic beta With caret. Mathematically, it can be proved that ModifyingAbove bold-italic beta With caret equals the maximizer of the following quadratic function:

g Superscript asterisk Baseline left-parenthesis bold x vertical-bar bold y comma bold upper W right-parenthesis equals minus one-half left-parenthesis bold x minus bold y right-parenthesis prime bold upper W left-parenthesis bold x minus bold y right-parenthesis

where bold y equals ModifyingAbove bold-italic beta With caret plus bold upper W Superscript negative 1 Baseline nabla l left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis, nabla l left-parenthesis dot right-parenthesis denotes the derivatives of l left-parenthesis dot right-parenthesis with respect to bold-italic beta, and bold upper W is a positive definite matrix of size left-parenthesis m minus 1 right-parenthesis times left-parenthesis m minus 1 right-parenthesis (Groeneboom and Wellner 1992).

An iterative algorithm is needed to determine ModifyingAbove bold-italic beta With caret. For the lth iteration, the algorithm updates the quantity

bold y Superscript left-parenthesis l right-parenthesis Baseline equals ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline minus bold upper W Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline right-parenthesis nabla l left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline right-parenthesis

where ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis is the parameter estimate from the previous iteration and bold upper W left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline right-parenthesis equals normal d normal i normal a normal g left-parenthesis w Subscript j Baseline comma j equals 1 comma ellipsis comma m minus 1 right-parenthesis is a positive definite diagonal matrix that depends on ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis.

A convenient choice for bold upper W left-parenthesis bold-italic beta right-parenthesis is the negative of the second-order derivative of the log-likelihood function l left-parenthesis bold-italic beta right-parenthesis:

w Subscript j Baseline equals w Subscript j Baseline left-parenthesis bold-italic beta right-parenthesis equals minus StartFraction partial-differential squared Over partial-differential beta Subscript j Superscript 2 Baseline EndFraction l left-parenthesis bold-italic beta right-parenthesis

Given bold y equals bold y Superscript left-parenthesis l right-parenthesis Baseline equals left-parenthesis y 1 Superscript left-parenthesis l right-parenthesis Baseline comma ellipsis comma y Subscript m minus 1 Superscript left-parenthesis l right-parenthesis Baseline right-parenthesis prime and bold upper W equals bold upper W left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline right-parenthesis, the parameter estimate for the lth iteration ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l right-parenthesis maximizes the quadratic function g Superscript asterisk Baseline left-parenthesis bold x vertical-bar bold y comma bold upper W right-parenthesis.

Define the cumulative sum diagram StartSet upper P Subscript k Baseline comma k equals 0 comma ellipsis comma m minus 1 EndSet as a set of m points in the plane, where upper P 0 equals left-parenthesis 0 comma 0 right-parenthesis and

upper P Subscript k Baseline equals left-parenthesis sigma-summation Underscript i equals 1 Overscript k Endscripts w Subscript i Baseline comma sigma-summation Underscript i equals 1 Overscript k Endscripts w Subscript i Baseline y Subscript i Superscript left-parenthesis l right-parenthesis Baseline right-parenthesis

Technically, ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l right-parenthesis equals the left derivative of the convex minorant, or in other words, the largest convex function below the diagram StartSet upper P Subscript k Baseline comma k equals 0 comma ellipsis comma m minus 1 EndSet. This optimization problem can be solved by the pool-adjacent-violators algorithm (Groeneboom and Wellner 1992).

Occasionally, the ICM step might not increase the likelihood. Jongbloed (1998) suggests conducting a line search to ensure that positive increments are always achieved. Alternatively, you can switch to the EM step, exploiting the fact that the EM iteration never decreases the likelihood, and then resume iterations of the ICM algorithm after the EM step. As with Turnbull’s method, convergence can be determined based on the closeness of two consecutive sets of parameter values or likelihood values. You can use the ICM algorithm by specifying METHOD=ICM in the PROC ICLIFETEST statement.

As its name suggests, the EMICM algorithm combines the self-consistent EM algorithm and the ICM algorithm by alternating the two different steps in its iterations. Wellner and Zhan (1997) show that the converged values of the EMICM algorithm always constitute an MLE if it exists and is unique. The ICLIFETEST procedure uses the EMICM algorithm as the default.

Variance Estimation of the Survival Estimator

Peto (1973) and Turnbull (1976) suggest estimating the variances of the survival estimates by inverting the Hessian matrix, which is obtained by twice differentiating the log-likelihood function. This method can become less stable when the number of theta Subscript j’s increase as n increases. Simulations have shown that the confidence limits based on variances estimated with this method tend to have conservative coverage probabilities that are greater than the nominal level (Goodall, Dunn, and Babiker 2004).

Sun (2001) proposes using two resampling techniques, simple bootstrap and multiple imputation, to estimate the variance of the survival estimator. The undefined regions that the Turnbull intervals represent create a special challenge using the bootstrap method. Because each bootstrap sample could have a different set of Turnbull intervals, some time points to evaluate the variances based on the original Turnbull intervals might be located within the intervals in a bootstrap sample, with the result that their survival probabilities become unknown. A simple ad hoc solution is to shrink the Turnbull interval to its right boundary and modify the survival estimates into a right continuous function:

ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis equals sigma-summation Underscript j colon p Subscript j Baseline greater-than t Endscripts ModifyingAbove theta With caret Subscript j

Let M denote the number of resampling data sets. Let upper A 1 Superscript k Baseline comma ellipsis comma upper A Subscript n Superscript k denote the n independent samples from the original data with replacement, k equals 1 comma ellipsis comma upper M. Let ModifyingAbove upper S With caret Subscript m Superscript k Baseline left-parenthesis t right-parenthesis be the modified estimate of the survival function computed from the kth resampling data set. Then you can estimate the variance of ModifyingAbove upper S With caret left-parenthesis t right-parenthesis by the sample variance as

ModifyingAbove sigma With caret Subscript b Superscript 2 Baseline left-parenthesis t right-parenthesis equals StartFraction 1 Over upper M minus 1 EndFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts left-bracket ModifyingAbove upper S With caret Subscript m Superscript k Baseline left-parenthesis t right-parenthesis minus ModifyingAbove upper S With bar Subscript m Baseline left-parenthesis t right-parenthesis right-bracket squared

where

ModifyingAbove upper S With bar Subscript m Baseline left-parenthesis t right-parenthesis equals StartFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts ModifyingAbove upper S With caret Subscript m Superscript k Baseline left-parenthesis t right-parenthesis Over upper M EndFraction

The method of multiple imputations exploits the fact that interval-censored data reduce to right-censored data when all interval observations of finite length shrink to single points. Suppose that each finite interval has been converted to one of the p Subscript j values it contains. For this right-censored data set, you can estimate the variance of the survival estimates via the well-known Greenwood formula as

ModifyingAbove sigma With caret Subscript upper G Superscript 2 Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper S With caret Subscript upper K upper M Superscript 2 Baseline left-parenthesis t right-parenthesis sigma-summation Underscript q Subscript j Baseline less-than t Endscripts StartFraction d Subscript j Baseline Over n Subscript j Baseline left-parenthesis n Subscript j Baseline minus d Subscript j Baseline right-parenthesis EndFraction

where d Subscript j is the number of events at time p Subscript j and n Subscript j is the number of subjects at risk just prior to p Subscript j, and ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t right-parenthesis is the Kaplan-Meier estimator of the survival function,

ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t right-parenthesis equals product Underscript q Subscript j Baseline less-than t Endscripts StartFraction n Subscript j Baseline minus d Subscript j Baseline Over n Subscript j Baseline EndFraction

Essentially, multiple imputation is used to account for the uncertainty of ranking overlapping intervals. The kth imputed data set is obtained by substituting every interval-censored observation of finite length with an exact event time randomly drawn from the conditional survival function:

ModifyingAbove upper S With caret Subscript i Baseline left-parenthesis t right-parenthesis equals StartFraction ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis minus ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis upper R Subscript i Baseline plus right-parenthesis Over ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis upper R Subscript i Baseline plus right-parenthesis EndFraction comma t element-of left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket

Because ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis only jumps at the p Subscript j, this is a discrete function.

Denote the Kaplan-Meier estimate of each imputed data set as ModifyingAbove upper S With caret Subscript upper K upper M Superscript k Baseline left-parenthesis t right-parenthesis. The variance of ModifyingAbove upper S With caret left-parenthesis t right-parenthesis is estimated by

ModifyingAbove sigma With caret Subscript upper I Superscript 2 Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper S With caret squared left-parenthesis t right-parenthesis sigma-summation Underscript q Subscript j Baseline less-than t Endscripts StartFraction d prime Subscript j Baseline Over n prime Subscript j Baseline left-parenthesis n prime Subscript j Baseline minus d prime Subscript j right-parenthesis EndFraction plus StartFraction 1 Over upper M minus 1 EndFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts left-bracket ModifyingAbove upper S With caret Subscript upper K upper M Superscript k Baseline left-parenthesis t right-parenthesis minus ModifyingAbove upper S With bar Subscript upper K upper M Baseline left-parenthesis t right-parenthesis right-bracket

where

ModifyingAbove upper S With bar Subscript upper K upper M Baseline left-parenthesis t right-parenthesis equals StartFraction 1 Over upper M EndFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts ModifyingAbove upper S With caret Subscript upper K upper M Superscript k Baseline left-parenthesis t right-parenthesis

and

d prime Subscript j Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction alpha Subscript i j Baseline left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket Over sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket EndFraction

and

n prime Subscript j Baseline equals sigma-summation Underscript k equals j Overscript m Endscripts d prime Subscript j

Note that the first term in the formula for ModifyingAbove sigma With caret Subscript upper I Superscript 2 Baseline left-parenthesis t right-parenthesis mimics the Greenwood formula but uses expected numbers of deaths and subjects. The second term is the sample variance of the Kaplan-Meier estimates of imputed data sets, which accounts for between-imputation contributions.

Pointwise Confidence Limits of the Survival Function

Pointwise confidence limits can be computed for the survival function given the estimated standard errors. Let alpha be specified by the ALPHA= option. Let z Subscript alpha slash 2 be the critical value for the standard normal distribution. That is, normal upper Phi left-parenthesis minus z Subscript alpha slash 2 Baseline right-parenthesis equals alpha slash 2, where normal upper Phi is the cumulative distribution function of the standard normal random variable.

Constructing the confidence limits for the survival function upper S left-parenthesis t right-parenthesis as ModifyingAbove upper S With caret left-parenthesis t right-parenthesis plus-or-minus z Subscript alpha slash 2 Baseline ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket might result in an estimate that exceeds the range [0,1] at extreme values of t. This problem can be avoided by applying a transformation to upper S left-parenthesis t right-parenthesis so that the range is unrestricted. In addition, certain transformed confidence intervals for upper S left-parenthesis t right-parenthesis perform better than the usual linear confidence intervals (Borgan and Liestøl 1990). You can use the CONFTYPE= option to set one of the following transformations: the log-log function (Kalbfleisch and Prentice 1980), the arcsine–square root function (Nair 1984), the logit function (Meeker and Escobar 1998), the log function, and the linear function.

Let g denote the transformation that is being applied to the survival function upper S left-parenthesis t right-parenthesis. Using the delta method, you estimate the standard error of g left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis by

tau left-parenthesis t right-parenthesis equals ModifyingAbove sigma With caret left-bracket g left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis right-bracket equals g prime left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket

where g’ is the first derivative of the function g. The 100(1 – alpha)% confidence interval for upper S left-parenthesis t right-parenthesis is given by

g Superscript negative 1 Baseline StartSet g left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket plus-or-minus z Subscript StartFraction alpha Over 2 EndFraction Baseline g prime left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket EndSet

where g Superscript negative 1 is the inverse function of g. The choices for the transformation g are as follows:

  • arcsine–square root transformation: The estimated variance of sine Superscript negative 1 Baseline left-parenthesis StartRoot ModifyingAbove upper S With caret left-parenthesis t right-parenthesis EndRoot right-parenthesis is ModifyingAbove tau With caret squared left-parenthesis t right-parenthesis equals StartFraction ModifyingAbove sigma With caret squared left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket Over 4 ModifyingAbove upper S With caret left-parenthesis t right-parenthesis left-bracket 1 minus ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket EndFraction period The 100(1 – alpha)% confidence interval for upper S left-parenthesis t right-parenthesis is given by

    sine squared left-brace max left-bracket 0 comma sine Superscript negative 1 Baseline left-parenthesis StartRoot ModifyingAbove upper S With caret left-parenthesis t right-parenthesis EndRoot right-parenthesis minus z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-bracket right-brace less-than-or-equal-to upper S left-parenthesis t right-parenthesis less-than-or-equal-to sine squared left-brace min left-bracket StartFraction pi Over 2 EndFraction comma sine Superscript negative 1 Baseline left-parenthesis StartRoot ModifyingAbove upper S With caret left-parenthesis t right-parenthesis EndRoot right-parenthesis plus z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-bracket right-brace
  • linear transformation: This is the same as the identity transformation. The 100(1 – alpha)% confidence interval for upper S left-parenthesis t right-parenthesis is given by

    ModifyingAbove upper S With caret left-parenthesis t right-parenthesis minus z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket less-than-or-equal-to upper S left-parenthesis t right-parenthesis less-than-or-equal-to ModifyingAbove upper S With caret left-parenthesis t right-parenthesis plus z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket
  • log transformation: The estimated variance of log left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis is ModifyingAbove tau With caret squared left-parenthesis t right-parenthesis equals StartFraction ModifyingAbove sigma With caret squared left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis Over ModifyingAbove upper S With caret squared left-parenthesis t right-parenthesis EndFraction period The 100(1 – alpha)% confidence interval for upper S left-parenthesis t right-parenthesis is given by

    ModifyingAbove upper S With caret left-parenthesis t right-parenthesis exp left-parenthesis minus z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-parenthesis less-than-or-equal-to upper S left-parenthesis t right-parenthesis less-than-or-equal-to ModifyingAbove upper S With caret left-parenthesis t right-parenthesis exp left-parenthesis z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-parenthesis
  • log-log transformation: The estimated variance of log left-parenthesis minus log left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis is ModifyingAbove tau With caret squared left-parenthesis t right-parenthesis equals StartFraction ModifyingAbove sigma With caret squared left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket Over left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis log left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis right-bracket squared EndFraction period The 100(1 – alpha)% confidence interval for upper S left-parenthesis t right-parenthesis is given by

    left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket Superscript exp left-parenthesis z Super Subscript StartFraction alpha Over 2 EndFraction Superscript ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-parenthesis Baseline less-than-or-equal-to upper S left-parenthesis t right-parenthesis less-than-or-equal-to left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket Superscript exp left-parenthesis minus z Super Subscript StartFraction alpha Over 2 EndFraction Superscript ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-parenthesis
  • logit transformation: The estimated variance of log left-parenthesis StartFraction ModifyingAbove upper S With caret left-parenthesis t right-parenthesis Over 1 minus ModifyingAbove upper S With caret left-parenthesis t right-parenthesis EndFraction right-parenthesis is

    ModifyingAbove tau With caret squared left-parenthesis t right-parenthesis equals StartFraction ModifyingAbove sigma With caret squared left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis Over ModifyingAbove upper S With caret squared left-parenthesis t right-parenthesis left-bracket 1 minus ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket squared EndFraction

    The 100(1 – alpha)% confidence limits for upper S left-parenthesis t right-parenthesis are given by

    StartFraction ModifyingAbove upper S With caret left-parenthesis t right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis t right-parenthesis plus left-bracket 1 minus ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket exp left-parenthesis z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-parenthesis EndFraction less-than-or-equal-to upper S left-parenthesis t right-parenthesis less-than-or-equal-to StartFraction ModifyingAbove upper S With caret left-parenthesis t right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis t right-parenthesis plus left-bracket 1 minus ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket exp left-parenthesis minus z Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove tau With caret left-parenthesis t right-parenthesis right-parenthesis EndFraction

Quartile Estimation

The first quartile (25th percentile) of the survival time is the time beyond which 75% of the subjects in the population under study are expected to survive. For interval-censored data, it is problematic to define point estimators of the quartiles based on the survival estimate ModifyingAbove upper S With caret left-parenthesis t right-parenthesis because of its undefined regions of Turnbull intervals. To overcome this problem, you need to impute survival probabilities within the Turnbull intervals. The previously defined estimator ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis achieves this by placing all the estimated probabilities at the right boundary of the interval. The first quartile is estimated by

q .25 equals normal m normal i normal n StartSet t Subscript j Baseline vertical-bar ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t Subscript j Baseline right-parenthesis less-than 0.75 EndSet

If ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis is exactly equal to 0.75 from t Subscript j to t Subscript j plus 1, the first quartile is taken to be left-parenthesis t Subscript j Baseline plus t Subscript j plus 1 Baseline right-parenthesis slash 2. If ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis is greater than 0.75 for all values of t, the first quartile cannot be estimated and is represented by a missing value in the printed output.

The general formula for estimating the 100p percentile point is

q Subscript p Baseline equals normal m normal i normal n StartSet t Subscript j Baseline vertical-bar ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t Subscript j Baseline right-parenthesis less-than 1 minus p EndSet

The second quartile (the median) and the third quartile of survival times correspond to p = 0.5 and p = 0.75, respectively.

Brookmeyer and Crowley (1982) constructed the confidence interval for the median survival time based on the confidence interval for the survival function upper S left-parenthesis t right-parenthesis. The methodology is generalized to construct the confidence interval for the 100p percentile based on a g-transformed confidence interval for upper S left-parenthesis t right-parenthesis (Klein and Moeschberger 1997). You can use the CONFTYPE= option to specify the g-transformation. The 100 left-parenthesis 1 minus alpha right-parenthesis% confidence interval for the first quantile survival time is the set of all points t that satisfy

StartAbsoluteValue StartFraction g left-parenthesis ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis right-parenthesis minus g left-parenthesis 1 minus 0.25 right-parenthesis Over g prime left-parenthesis ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis right-parenthesis ModifyingAbove sigma With caret left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis EndFraction EndAbsoluteValue less-than-or-equal-to z Subscript 1 minus StartFraction alpha Over 2 EndFraction

where g prime left-parenthesis x right-parenthesis is the first derivative of g left-parenthesis x right-parenthesis and z Subscript 1 minus StartFraction alpha Over 2 EndFraction is the 100 left-parenthesis 1 minus StartFraction alpha Over 2 EndFraction right-parenthesis percentile of the standard normal distribution.

Kernel-Smoothed Estimation

After you obtain the survival estimate ModifyingAbove upper S With caret left-parenthesis t right-parenthesis, you can construct a discrete estimator for the cumulative hazard function. First, you compute the jumps of the discrete function as

ModifyingAbove lamda With caret Subscript j Baseline equals StartFraction c Subscript j Baseline ModifyingAbove theta With caret Subscript j Baseline Over sigma-summation Underscript k equals j Overscript m Endscripts c Subscript k Baseline ModifyingAbove theta With caret Subscript k Baseline EndFraction comma j equals 1 comma ellipsis comma m

where the c Subscript j’s have been defined previously for calculating the Lagrange multiplier statistic.

Essentially, the numerator and denominator estimate the number of failures and the number at risks that are associated with the Turnbull intervals. Thus these quantities estimate the increments of the cumulative hazard function over the Turnbull intervals.

The estimator of the cumulative hazard function is

ModifyingAbove lamda With caret left-parenthesis t right-parenthesis equals sigma-summation Underscript k colon p Subscript k Baseline less-than t Endscripts ModifyingAbove lamda With caret Subscript k Baseline comma t not-an-element-of normal a normal n normal y upper I Subscript j Baseline

Like ModifyingAbove upper S With caret left-parenthesis t right-parenthesis, ModifyingAbove lamda With caret left-parenthesis t right-parenthesis is undefined if t is located within some Turnbull interval upper I Subscript j. To facilitate applying the kernel-smoothed methods, you need to reformulate the estimator so that it has only point masses. An ad hoc approach would be to place all the mass for a Turnbull interval at the right boundary. The kernel-based estimate of the hazard function is computed as

ModifyingAbove h With tilde left-parenthesis t comma b right-parenthesis equals minus StartFraction 1 Over b EndFraction sigma-summation Underscript j equals 1 Overscript m Endscripts upper K left-parenthesis StartFraction t minus p Subscript j Baseline Over b EndFraction right-parenthesis ModifyingAbove lamda With caret Subscript j

where upper K left-parenthesis dot right-parenthesis is a kernel function and b greater-than 0 is the bandwidth. You can estimate the cumulative hazard function by integrating ModifyingAbove h With tilde left-parenthesis t comma b right-parenthesis with respect to t.

Practically, an upper limit t Subscript upper D is usually imposed so that the kernel-smoothed estimate is defined on left-parenthesis 0 comma t Subscript upper D Baseline right-parenthesis. The ICLIFETEST procedure sets the value depending on whether the right boundary of the last Turnbull interval is finite or not: t Subscript upper D Baseline equals p Subscript m if p Subscript m Baseline less-than normal infinity and t Subscript upper D Baseline equals 1.2 asterisk q Subscript m otherwise.

Typical choices of kernel function are as follows:

  • uniform kernel:

    upper K Subscript upper U Baseline left-parenthesis x right-parenthesis equals one-half comma negative 1 less-than-or-equal-to x less-than-or-equal-to 1
  • Epanechnikov kernel:

    upper K Subscript upper E Baseline left-parenthesis x right-parenthesis equals three-fourths left-parenthesis 1 minus x squared right-parenthesis comma negative 1 less-than-or-equal-to x less-than-or-equal-to 1
  • biweight kernel:

    upper K Subscript upper B upper W Baseline left-parenthesis x right-parenthesis equals StartFraction 15 Over 16 EndFraction left-parenthesis 1 minus x squared right-parenthesis squared comma negative 1 less-than-or-equal-to x less-than-or-equal-to 1

For t < b, the symmetric kernels upper K left-parenthesis right-parenthesis are replaced by the corresponding asymmetric kernels of Gasser and Müller (1979). Let q equals StartFraction t Over b EndFraction. The modified kernels are as follows:

  • uniform kernel:

    upper K Subscript upper U comma q Baseline left-parenthesis x right-parenthesis equals StartFraction 4 left-parenthesis 1 plus q cubed right-parenthesis Over left-parenthesis 1 plus q right-parenthesis Superscript 4 Baseline EndFraction plus StartFraction 6 left-parenthesis 1 minus q right-parenthesis Over left-parenthesis 1 plus q right-parenthesis cubed EndFraction x comma negative 1 less-than-or-equal-to x less-than-or-equal-to q
  • Epanechnikov kernel:

    upper K Subscript upper E comma q Baseline left-parenthesis x right-parenthesis equals upper K Subscript upper E Baseline left-parenthesis x right-parenthesis StartFraction 64 left-parenthesis 2 minus 4 q plus 6 q squared minus 3 q cubed right-parenthesis plus 240 left-parenthesis 1 minus q right-parenthesis squared x Over left-parenthesis 1 plus q right-parenthesis Superscript 4 Baseline left-parenthesis 19 minus 18 q plus 3 q squared right-parenthesis EndFraction comma negative 1 less-than-or-equal-to x less-than-or-equal-to q
  • biweight kernel:

    upper K Subscript upper B upper W comma q Baseline left-parenthesis x right-parenthesis equals upper K Subscript upper B upper W Baseline left-parenthesis x right-parenthesis StartFraction 64 left-parenthesis 8 minus 24 q plus 48 q squared minus 45 q cubed plus 15 q Superscript 4 Baseline right-parenthesis plus 1120 left-parenthesis 1 minus q right-parenthesis cubed x Over left-parenthesis 1 plus q right-parenthesis Superscript 5 Baseline left-parenthesis 81 minus 168 q plus 126 q squared minus 40 q cubed plus 5 q Superscript 4 Baseline right-parenthesis EndFraction comma negative 1 less-than-or-equal-to x less-than-or-equal-to q

For t Subscript upper D Baseline minus b less-than-or-equal-to t less-than-or-equal-to t Subscript upper D, let q equals StartFraction t Subscript upper D Baseline minus t Over b EndFraction. The asymmetric kernels for t less-than b are used, with x replaced by –x.

The bandwidth parameter b controls how much “smoothness” you want to have in the kernel-smoothed estimate. For right-censored data, a commonly accepted method of choosing an optimal bandwidth is to use the mean integrated square error(MISE) as an objective criteria. This measure becomes difficult to adapt to interval-censored data because it no longer has a closed-form mathematical formula.

Pan (2000) proposes using a V-fold cross validation likelihood as a criterion for choosing the optimal bandwidth for the kernel-smoothed estimate of the survival function. The ICLIFETEST procedure implements this approach for smoothing the hazard function. Computing such a criterion entails a cross validation type procedure. First, the original data script upper D are partitioned into V almost balanced subsets script upper D Superscript left-parenthesis v right-parenthesis, v equals 1 comma ellipsis comma upper V. Denote the kernel-smoothed estimate of the leave-one-subset-out data script upper D minus script upper D Superscript left-parenthesis v right-parenthesis as ModifyingAbove h With caret Superscript asterisk left-parenthesis negative v right-parenthesis Baseline left-parenthesis t semicolon b right-parenthesis. The optimal bandwidth is defined as the one that maximizes the cross validation likelihood:

b 0 equals argmax Subscript StartLayout 1st Row  b EndLayout Baseline sigma-summation Underscript v equals 1 Overscript upper V Endscripts upper L left-parenthesis ModifyingAbove h With caret Superscript asterisk left-parenthesis negative v right-parenthesis Baseline left-parenthesis t semicolon b right-parenthesis vertical-bar script upper D Superscript left-parenthesis v right-parenthesis Baseline right-parenthesis

Comparison of Survival between Groups

If the TEST statement is specified, the ICLIFETEST procedure compares the K groups formed by the levels of the TEST variable using a generalized log-rank test. Let upper S Subscript k Baseline left-parenthesis t right-parenthesis be the underlying survival function of the kth group, k equals 1 comma ellipsis comma upper K. The null and alternative hypotheses to be tested are

upper H 0 colon upper S 1 left-parenthesis t right-parenthesis equals upper S 2 left-parenthesis t right-parenthesis equals midline-horizontal-ellipsis equals upper S Subscript upper K Baseline left-parenthesis t right-parenthesis for all t

versus

upper H 1 colon at least one of the upper S Subscript k Baseline left-parenthesis t right-parenthesis’s is different for some t

Let upper N Subscript k denote the number of subjects in group k, and let n denote the total number of subjects (n equals upper N 1 plus midline-horizontal-ellipsis plus upper N Subscript upper K).

Generalized Log-Rank Statistic

For the ith subject, let bold z Subscript i Baseline equals left-parenthesis z Subscript i Baseline 1 Baseline comma ellipsis comma z Subscript i upper K Baseline right-parenthesis prime be a vector of K indicators that represent whether or not the subject belongs to the kth group. Denote bold-italic beta equals left-parenthesis beta 1 comma ellipsis comma beta Subscript upper K Baseline right-parenthesis prime, where beta Subscript k represents the treatment effect for the kth group. Suppose that a model is specified and the survival function for the ith subject can be written as

upper S left-parenthesis t vertical-bar bold z Subscript i Baseline right-parenthesis equals upper S left-parenthesis t vertical-bar bold z prime Subscript i Baseline bold-italic beta comma bold-italic gamma right-parenthesis

where bold-italic gamma denotes the nuisance parameters.

It follows that the likelihood function is

upper L equals product Underscript i equals 1 Overscript n Endscripts left-bracket upper S left-parenthesis upper L Subscript i Baseline vertical-bar bold z prime Subscript i Baseline bold-italic beta comma bold-italic gamma right-parenthesis minus upper S left-parenthesis upper R Subscript i Baseline vertical-bar bold z prime Subscript i Baseline bold-italic beta comma bold-italic gamma right-parenthesis right-bracket

where left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-parenthesis denotes the interval observation for the ith subject.

Testing whether or not the survival functions are equal across the K groups is equivalent to testing whether all the beta Subscript j’s are zero. It is natural to consider a score test based on the specified model (Finkelstein 1986).

The score statistics for bold-italic beta are derived as the first-order derivatives of the log-likelihood function evaluated at bold-italic beta equals bold 0 and ModifyingAbove bold-italic gamma With caret.

bold upper U equals left-parenthesis upper U 1 comma ellipsis comma upper U Subscript upper K Baseline right-parenthesis prime equals StartFraction partial-differential log left-parenthesis upper L right-parenthesis Over partial-differential bold-italic beta EndFraction vertical-bar Subscript bold-italic beta equals bold 0 comma ModifyingAbove bold-italic gamma With caret Baseline

where ModifyingAbove bold-italic gamma With caret denotes the maximum likelihood estimate for the bold-italic gamma, given that bold-italic beta equals bold 0.

Under the null hypothesis that bold-italic beta equals bold 0, all K groups share the same survival function upper S left-parenthesis t right-parenthesis. It is typical to leave upper S left-parenthesis t right-parenthesis unspecified and obtain a nonparametric maximum likelihood estimate ModifyingAbove upper S With caret left-parenthesis t right-parenthesis using, for instance, Turnbull’s method. In this case, bold-italic gamma represents all the parameters to be estimated in order to determine ModifyingAbove upper S With caret left-parenthesis t right-parenthesis.

Suppose the given data generates m Turnbull intervals as StartSet upper I Subscript j Baseline equals left-parenthesis q Subscript j Baseline comma p Subscript j Baseline right-bracket comma j equals 1 comma ellipsis comma m EndSet. Denote the probability estimate at the right end point of the jth interval by ModifyingAbove theta With caret Subscript j. The nonparametric survival estimate is ModifyingAbove upper S With caret left-parenthesis t right-parenthesis equals sigma-summation Underscript k colon p Subscript k Baseline greater-than t Endscripts ModifyingAbove theta With caret Subscript k for t not-an-element-of any upper I Subscript j.

Under the null hypothesis, Fay (1999) showed that the score statistics can be written in the form of a weighted log-rank test as

upper U Subscript k Baseline equals sigma-summation Underscript j equals 1 Overscript m Endscripts upper U Subscript k j Baseline equals sigma-summation Underscript j equals 1 Overscript m Endscripts v Subscript j Baseline left-parenthesis d prime Subscript k j Baseline minus StartFraction n prime Subscript k j Baseline Over n prime Subscript j EndFraction d prime Subscript j right-parenthesis

where

v Subscript j Baseline equals StartFraction left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis minus ModifyingAbove upper S With caret prime left-parenthesis p Subscript j minus 1 Baseline right-parenthesis right-bracket left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret prime left-parenthesis p Subscript j Baseline right-parenthesis right-bracket Over ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket EndFraction

and upper S prime left-parenthesis t right-parenthesis denotes the derivative of upper S left-parenthesis t right-parenthesis with respect to bold-italic beta.

d prime Subscript k j estimates the expected number of events within upper I Subscript j for the kth group, and it is computed as

d prime Subscript k j Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts z Subscript i k Baseline StartFraction alpha Subscript i j Baseline ModifyingAbove theta With caret Subscript j Baseline Over sigma-summation Underscript l equals 1 Overscript m Endscripts alpha Subscript i l Baseline ModifyingAbove theta Subscript l Baseline With caret EndFraction

d prime Subscript j is an estimate for the expected number of events within upper I Subscript j for the whole sample, and it is computed as

d prime Subscript j Baseline equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts d prime Subscript k j

Similarly, n prime Subscript k j estimates the expected number of subjects at risk before entering upper I Subscript j for the kth group, and can be estimated by n prime Subscript k j Baseline equals sigma-summation Underscript l equals j Overscript m Endscripts d prime Subscript k l. n prime Subscript j is an estimate of the expected number of subjects at risk before entering upper I Subscript j for all the groups: n prime Subscript j Baseline equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts n prime Subscript k j.

Assuming different survival models gives rise to different weight functions v Subscript j (Fay 1999). For example, Finkelstein’s score test (1986) is derived assuming a proportional hazards model; Fay’s test (1996) is based on a proportional odds model.

The choices of weight function are given in Table 3.

Table 3: Weight Functions for Various Tests

Test v Subscript j
Sun (1996) 1.0
Fay (1999) ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis
Finkelstein (1986) StartFraction ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis left-bracket log ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus log ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket Over ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis EndFraction
Harrington-Fleming (p,q) left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis right-bracket Superscript p Baseline left-bracket 1 minus ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis right-bracket Superscript q Baseline comma p greater-than-or-equal-to 0 comma q greater-than-or-equal-to 0


Variance Estimation of the Generalized Log-Rank Statistic

Sun (1996) proposed the use of multiple imputation to estimate the variance-covariance matrix of the generalized log-rank statistic bold upper U. This approach is similar to the multiple imputation method as presented in Variance Estimation of the Survival Estimator. Both methods impute right-censored data from interval-censored data and analyze the imputed data sets by using standard statistical techniques. Huang, Lee, and Yu (2008) suggested improving the performance of the generalized log-rank test by slightly modifying the variance calculation.

Suppose the given data generate m Turnbull intervals as StartSet upper I Subscript j Baseline equals left-parenthesis q Subscript j Baseline comma p Subscript j Baseline right-bracket comma j equals 1 comma ellipsis comma m EndSet. Denote the probability estimate for the jth interval as ModifyingAbove theta With caret Subscript j, and denote the nonparametric survival estimate as ModifyingAbove upper S With caret left-parenthesis t right-parenthesis equals sigma-summation Underscript k colon p Subscript k Baseline greater-than t Endscripts ModifyingAbove theta With caret Subscript k for t not-an-element-of any upper I Subscript j.

In order to generate an imputed data set, you need to randomly generate a survival time for every subject of the sample. For the ith subject, a random time upper T Subscript i Superscript asterisk is generated randomly based on the following discrete survival function:

ModifyingAbove upper S With caret Subscript i Baseline left-parenthesis upper T Subscript i Superscript asterisk Baseline equals p Subscript j Baseline right-parenthesis equals StartFraction ModifyingAbove upper S With caret left-parenthesis q Subscript j Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline plus right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline plus right-parenthesis EndFraction comma p Subscript j Baseline element-of left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket comma j equals 1 comma ellipsis comma m

where left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket denotes the interval observation for the subject.

For the hth imputed data set (h equals 1 comma ellipsis comma upper H), let d Subscript k j Superscript h and n Subscript k j Superscript h denote the numbers of failures and subjects at risk by counting the imputed upper T Subscript i Superscript asterisk’s for group k. Let d Subscript j Superscript h and n Subscript j Superscript h denote the corresponding pooled numbers.

You can perform the standard weighted log-rank test for right-censored data on each of the imputed data sets (Huang, Lee, and Yu 2008). The test statistic is

bold upper U Superscript h Baseline equals left-parenthesis upper U 1 Superscript h Baseline comma ellipsis comma upper U Subscript upper K Superscript h Baseline right-parenthesis prime

where

upper U Subscript k Superscript h Baseline equals sigma-summation Underscript j equals 1 Overscript m Endscripts v Subscript j Baseline left-parenthesis d Subscript k j Superscript h Baseline minus StartFraction n Subscript k j Superscript h Baseline Over n Subscript j Superscript h Baseline EndFraction d Subscript j Superscript h Baseline right-parenthesis

Its variance-covariance matrix is estimated by the Greenwood formula as

bold upper V Superscript h Baseline equals bold upper V 1 Superscript h Baseline plus midline-horizontal-ellipsis plus bold upper V Subscript m Superscript h

where

left-parenthesis bold upper V Subscript j Superscript h Baseline right-parenthesis Subscript l 1 l 2 Baseline equals StartLayout Enlarged left-brace 1st Row  v Subscript j Superscript 2 Baseline n Subscript l 1 j Superscript h Baseline left-parenthesis n Subscript j Superscript h Baseline minus n Subscript l 1 j Superscript h Baseline d Subscript j Superscript h Baseline left-parenthesis n Subscript j Superscript h Baseline minus d Subscript j Superscript h Baseline right-parenthesis left-parenthesis n Subscript j Superscript h Baseline right-parenthesis Superscript negative 2 Baseline left-parenthesis n Subscript j Superscript h Baseline minus 1 right-parenthesis Superscript negative 1 Baseline right-parenthesis when l 1 equals l 2 2nd Row  minus v Subscript j Superscript 2 Baseline n Subscript l 1 j Superscript h Baseline n Subscript l 2 j Superscript h Baseline d Subscript j Superscript h Baseline left-parenthesis n Subscript j Superscript h Baseline minus d Subscript j Superscript h Baseline right-parenthesis left-parenthesis n Subscript j Superscript h Baseline right-parenthesis Superscript negative 2 Baseline left-parenthesis n Subscript j Superscript h Baseline minus 1 right-parenthesis Superscript negative 1 Baseline when l 1 not-equals l 2 EndLayout

After analyzing each imputed data set, you can estimate the variance-covariance matrix of bold upper U by pooling the results as

ModifyingAbove bold upper V With caret equals StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts bold upper V Superscript h Baseline minus StartFraction 1 Over upper H minus 1 EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts left-bracket bold upper U Superscript h Baseline minus bold upper U overbar right-bracket left-bracket bold upper U Superscript h Baseline minus bold upper U overbar right-bracket prime

where

bold upper U overbar equals StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts bold upper U Superscript h

The overall test statistic is formed as bold upper U prime bold upper V Superscript minus Baseline bold upper U, where bold upper V Superscript minus is the generalized inverse of bold upper V. Under the null hypothesis, the statistic has a chi-squared distribution with degrees of freedom equal to the rank of bold upper V. By default, the ICLIFETEST procedure perform 1000 imputations. You can change the number of imputations by the IMPUTE option in the PROC ICLIFETEST statement.

Stratified Tests

Suppose the generalized log-rank test is to be stratified on the M levels that are formed from the variables that you specify in the STRATA statement. Based only on the data of the sth stratum (s equals 1 comma ellipsis comma upper M), let bold upper U Subscript left-parenthesis s right-parenthesis be the test statistic for the sth stratum and let upper V Subscript left-parenthesis s right-parenthesis be the corresponding covariance matrix as constructed in the section Variance Estimation of the Generalized Log-Rank Statistic. First, sum over the stratum-specific estimates as follows:

bold upper U period equals sigma-summation Underscript s equals 1 Overscript upper M Endscripts bold upper U Subscript left-parenthesis s right-parenthesis Baseline
bold upper V period equals sigma-summation Underscript s equals 1 Overscript upper M Endscripts bold upper V Subscript left-parenthesis s right-parenthesis Baseline

Then construct the global test statistic as

bold upper U period prime bold upper V period Superscript minus Baseline bold upper U period

Under the null hypothesis, the test statistic has a chi-squared distribution with degrees of freedom equal to the rank of bold upper V period. The ICLIFETEST procedure performs the stratified test only when the groups to be compared are balanced across all the strata.

Multiple-Comparison Adjustments

When you have more than two groups, a generalized log-rank test tells you whether the survival curves are significantly different from each other, but it does not identify which pairs of curves are different. Pairwise comparisons can be performed based on the generalized log-rank statistic and the corresponding variance-covariance matrix. However, reporting all pairwise comparisons is problematic because the overall Type I error rate would be inflated. A multiple-comparison adjustment of the p-values for the paired comparisons retains the same overall probability of a Type I error as the K-sample test.

The ICLIFETEST procedure supports two types of paired comparisons: comparisons between all pairs of curves and comparisons between a control curve and all other curves. You use the DIFF= option to specify the comparison type, and you use the ADJUST= option to select a method of multiple-comparison adjustments.

Let chi Subscript r Superscript 2 denote a chi-square random variable with r degrees of freedom. Denote phi and normal upper Phi as the density function and the cumulative distribution function of a standard normal distribution, respectively. Let m be the number of comparisons; that is,

StartLayout 1st Row  m equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction k left-parenthesis k minus 1 right-parenthesis Over 2 EndFraction 2nd Column normal upper D normal upper I normal upper F normal upper F equals normal upper A normal upper L normal upper L 2nd Row 1st Column k minus 1 2nd Column normal upper D normal upper I normal upper F normal upper F equals normal upper C normal upper O normal upper N normal upper T normal upper R normal upper O normal upper L EndLayout EndLayout

For a two-sided test that compares the survival of the jth group with that of lth group, 1 less-than-or-equal-to j not-equals l less-than-or-equal-to r, the test statistic is

z Subscript j l Superscript 2 Baseline equals StartFraction left-parenthesis upper U Subscript j Baseline minus upper U Subscript l Baseline right-parenthesis squared Over upper V Subscript j j Baseline plus upper V Subscript l l Baseline minus 2 upper V Subscript j l Baseline EndFraction

and the raw p-value is

p equals normal upper P normal r left-parenthesis chi 1 squared greater-than z Subscript j l Superscript 2 Baseline right-parenthesis

For multiple comparisons of more than two groups (r greater-than 2), adjusted p-values are computed as follows:

  • Bonferroni adjustment:

    p equals normal m normal i normal n StartSet 1 comma m normal upper P normal r left-parenthesis chi 1 squared greater-than z Subscript j l Superscript 2 Baseline right-parenthesis EndSet
  • Dunnett-Hsu adjustment: With the first group defined as the control, there are r minus 1 comparisons to be made. Let bold upper C equals left-parenthesis c Subscript i j Baseline right-parenthesis be the left-parenthesis r minus 1 right-parenthesis times r matrix of contrasts that represents the r minus 1 comparisons; that is,

    StartLayout 1st Row  c Subscript i j Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column i equals 1 comma ellipsis comma r minus 1 comma j equals 2 comma ellipsis comma r 2nd Row 1st Column negative 1 2nd Column j equals i plus 1 comma i equals 2 comma ellipsis comma r 3rd Row 1st Column 0 2nd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout EndLayout

    Let bold upper Sigma identical-to left-parenthesis sigma Subscript i j Baseline right-parenthesis and bold upper R identical-to left-parenthesis r Subscript i j Baseline right-parenthesis be covariance and correlation matrices of bold upper C bold v, respectively; that is,

    bold upper Sigma equals bold upper C bold upper V bold upper C prime

    and

    r Subscript i j Baseline equals StartFraction sigma Subscript i j Baseline Over StartRoot sigma Subscript i i Baseline sigma Subscript j j Baseline EndRoot EndFraction

    The factor-analytic covariance approximation of Hsu (1992) is to find lamda 1 comma ellipsis comma lamda Subscript r minus 1 Baseline such that

    bold upper R equals bold upper D plus bold-italic lamda bold-italic lamda prime

    where bold upper D is a diagonal matrix whose jth diagonal element is 1 minus lamda Subscript j and bold-italic lamda equals left-parenthesis lamda 1 comma ellipsis comma lamda Subscript r minus 1 Baseline right-parenthesis prime. The adjusted p-value is

    p equals 1 minus integral Subscript negative normal infinity Superscript normal infinity Baseline phi left-parenthesis y right-parenthesis product Underscript i equals 1 Overscript r minus 1 Endscripts left-bracket normal upper Phi left-parenthesis StartFraction lamda Subscript i Baseline y plus z Subscript j l Baseline Over StartRoot 1 minus lamda Subscript i Superscript 2 Baseline EndRoot EndFraction right-parenthesis minus normal upper Phi left-parenthesis StartFraction lamda Subscript i Baseline y minus z Subscript j l Baseline Over StartRoot 1 minus lamda Subscript i Superscript 2 Baseline EndRoot EndFraction right-parenthesis right-bracket d y

    This value can be obtained in a DATA step as

    p equals normal upper P normal upper R normal upper O normal upper B normal upper M normal upper C left-parenthesis quotation-mark normal upper D normal upper U normal upper N normal upper N normal upper E normal upper T normal upper T Baseline 2 quotation-mark comma z Subscript i j Baseline comma period comma period comma r minus 1 comma lamda 1 comma ellipsis comma lamda Subscript r minus 1 Baseline right-parenthesis period
  • Scheffé adjustment:

    p equals normal upper P normal r left-parenthesis chi Subscript r minus 1 Superscript 2 Baseline greater-than z Subscript j l Superscript 2 Baseline right-parenthesis
  • Šidák adjustment:

    p equals 1 minus StartSet 1 minus normal upper P normal r left-parenthesis chi 1 squared greater-than z Subscript j l Superscript 2 Baseline right-parenthesis EndSet Superscript m
  • SMM adjustment:

    p equals 1 minus left-bracket 2 normal upper Phi left-parenthesis z Subscript j l Baseline right-parenthesis minus 1 right-bracket Superscript m

    This can also be evaluated in a DATA step as

    p equals 1 minus normal upper P normal upper R normal upper O normal upper B normal upper M normal upper C left-parenthesis quotation-mark normal upper M normal upper A normal upper X normal upper M normal upper O normal upper D quotation-mark comma z Subscript j l Baseline comma period comma period comma m right-parenthesis period
  • Tukey adjustment:

    p equals 1 minus integral Subscript negative normal infinity Superscript normal infinity Baseline r phi left-parenthesis y right-parenthesis left-bracket normal upper Phi left-parenthesis y right-parenthesis minus normal upper Phi left-parenthesis y minus StartRoot 2 EndRoot z Subscript j l Baseline right-parenthesis right-bracket Superscript r minus 1 Baseline d y

    This can be evaluated in a DATA step as

    p equals 1 minus normal upper P normal upper R normal upper O normal upper B normal upper M normal upper C left-parenthesis quotation-mark normal upper R normal upper A normal upper N normal upper G normal upper E quotation-mark comma StartRoot 2 EndRoot z Subscript j l Baseline comma period comma period comma r right-parenthesis period
Trend Tests

Trend tests for right-censored data (Klein and Moeschberger 1997, Section 7.4) can be extended to interval-censored data in a straightforward way. Such tests are specifically designed to detect ordered alternatives as

upper H 1 colon upper S 1 left-parenthesis t right-parenthesis greater-than-or-equal-to upper S 2 left-parenthesis t right-parenthesis greater-than-or-equal-to midline-horizontal-ellipsis greater-than-or-equal-to upper S Subscript upper K Baseline left-parenthesis t right-parenthesis comma t less-than-or-equal-to tau comma with at least one inequality

or

upper H 2 colon upper S 1 left-parenthesis t right-parenthesis less-than-or-equal-to upper S 2 left-parenthesis t right-parenthesis less-than-or-equal-to midline-horizontal-ellipsis less-than-or-equal-to upper S Subscript upper K Baseline left-parenthesis t right-parenthesis comma t less-than-or-equal-to tau comma with at least one inequality

Let a 1 less-than a 2 less-than midline-horizontal-ellipsis less-than a Subscript upper K be a sequence of scores associated with the K samples. Let bold upper U equals left-parenthesis upper U 1 comma ellipsis comma upper U Subscript upper K Baseline right-parenthesis be the generalized log-rank statistic and bold upper V equals left-parenthesis upper V Subscript j l Baseline right-parenthesis be the corresponding covariance matrix of size upper K times upper K as constructed in the section Variance Estimation of the Generalized Log-Rank Statistic. The trend test statistic and its variance are given by sigma-summation Underscript j equals 1 Overscript upper K Endscripts a Subscript j Baseline upper U Subscript j and sigma-summation Underscript j equals 1 Overscript upper K Endscripts sigma-summation Underscript l equals 1 Overscript upper K Endscripts a Subscript j Baseline a Subscript l Baseline upper V Subscript j l, respectively. Under the null hypothesis that there is no trend, the following z-score has, asymptotically, a standard normal distribution:

upper Z equals StartFraction sigma-summation Underscript j equals 1 Overscript upper K Endscripts a Subscript j Baseline upper U Subscript j Baseline Over StartRoot left-brace EndRoot sigma-summation Underscript j equals 1 Overscript upper K Endscripts sigma-summation Underscript l equals 1 Overscript upper K Endscripts a Subscript j Baseline a Subscript l Baseline upper V Subscript j l Baseline right-brace EndFraction

The ICLIFETEST procedure provides both one-tail and two-tail p-values for the test.

Scores for Permutation Tests

The weighted log-rank statistic can also be expressed as

bold upper U equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold z Subscript i Baseline c Subscript i

where c Subscript i is the score from the ith subject and follows the form

c Subscript i Baseline equals StartFraction ModifyingAbove upper S With caret prime left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret prime left-parenthesis upper R Subscript i Baseline right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis EndFraction

where upper S prime left-parenthesis dot right-parenthesis denotes the derivative of upper S left-parenthesis dot right-parenthesis with respect to bold-italic beta, which is evaluated at bold-italic beta equals bold 0.

As presented in Table 4, Fay (1999) derives the forms of scores for three weight functions. Under the assumption that censoring is independent of the grouping of subjects, these derived scores can be used by permutation tests.

Table 4: Scores for Different Weight Functions

Test Weight v Subscript j
Sun (1996) minus StartFraction ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis ModifyingAbove lamda With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis ModifyingAbove lamda With caret left-parenthesis upper R Subscript i Baseline right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis EndFraction
Fay (1999) ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis plus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis minus 1
Finkelstein (1986) StartFraction ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis log left-bracket ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis right-bracket minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis log left-bracket ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis right-bracket Over ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis EndFraction


You can output scores to a designated SAS data set by specifying the OUTSCORE= option in the TEST statement.

Last updated: December 09, 2022