The ICLIFETEST Procedure

Statistical Methods

Nonparametric Estimation of the Survival Function

Suppose the event times for a total of n subjects, , , …, , are independent random variables with an underlying cumulative distribution function . Denote the corresponding survival function as . Interval-censoring occurs when some or all ’s cannot be observed directly but are known to be within the interval .

The observed intervals might or might not overlap. It they do not overlap, then you can usually use conventional methods for right-censored data, with minor modifications. On the other hand, if some intervals overlap, you need special algorithms to compute an unbiased estimate of the underlying survival function.

To characterize the nonparametric estimate of the survival function, Peto (1973) and Turnbull (1976) show that the estimate can jump only at the right endpoint of a set of nonoverlapping intervals (also known as Turnbull intervals), . A simple algorithm for finding these intervals is to order all the boundary values with labels of L and R attached and then pick up the intervals that have L as the left boundary and R as the right boundary. For example, suppose that the data set contains only three intervals, , , and . The ordered values are . Then the Turnbull intervals are and .

For the exact observation , Ng (2002) suggests that it be represented by the interval for a positive small value . If for an observation (), then the observation is represented by .

Define , . Given the data, the survival function, , can be determined only up to equivalence classes t, which are complements of the Turnbull intervals. is undefined if t is within some . The likelihood function for is then

upper L left-parenthesis bold-italic theta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts left-parenthesis sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline theta Subscript j Baseline right-parenthesis

where is 1 if is contained in and 0 otherwise.

Denote the maximum likelihood estimate for as . The survival function can then be estimated as

ModifyingAbove upper S With caret left-parenthesis t right-parenthesis equals sigma-summation Underscript k colon p Subscript k Baseline greater-than t Endscripts ModifyingAbove theta With caret Subscript k Baseline comma t not-an-element-of normal a normal n normal y upper I Subscript j Baseline comma j equals 1 comma ellipsis comma m

Estimation Algorithms

Peto (1973) suggests maximizing this likelihood function by using a Newton-Raphson algorithm subject to the constraint . This approach has been implemented in the ICE macro. Although feasible, the optimization becomes less stable as the dimension of increases.

Treating interval-censored data as missing data, Turnbull (1976) derives a self-consistent equation for estimating the ’s:

theta Subscript j Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts mu Subscript i j Baseline left-parenthesis bold-italic theta right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction alpha Subscript i j Baseline theta Subscript j Baseline Over sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline theta Subscript j Baseline EndFraction

where is the expected probability that the event occurs within for the ith subject, given the observed data.

The algorithm is an expectation-maximization (EM) algorithm in the sense that it iteratively updates and . Convergence is declared if, for a chosen number ,

sigma-summation Underscript j equals 1 Overscript m Endscripts StartAbsoluteValue ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l right-parenthesis Baseline minus ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l minus 1 right-parenthesis Baseline EndAbsoluteValue less-than epsilon

where denotes the updated value for after the lth iteration.

An alternative criterion is to declare convergence when increments of the likelihood are small:

StartAbsoluteValue upper L left-parenthesis ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l right-parenthesis Baseline semicolon j equals 1 comma ellipsis comma m right-parenthesis minus upper L left-parenthesis ModifyingAbove theta With caret Subscript j Superscript left-parenthesis l minus 1 right-parenthesis Baseline semicolon j equals 1 comma ellipsis comma m right-parenthesis EndAbsoluteValue less-than epsilon

There is no guarantee that the converged values constitute a maximum likelihood estimate (MLE). Gentleman and Geyer (1994) introduced the Kuhn-Tucker conditions based on constrained programming as a check of whether the algorithm converges to a legitimate MLE. These conditions state that a sufficient and necessary condition for the estimate to be a MLE is that the Lagrange multipliers are nonnegative for all the ’s that are estimated to be zero, where is the derivative of the log-likelihood function with respect to :

c Subscript j Baseline equals StartFraction partial-differential log left-parenthesis upper L right-parenthesis Over partial-differential theta Subscript j Baseline EndFraction equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction alpha Subscript i j Baseline Over sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline theta Subscript j Baseline EndFraction

You can use Turnbull’s method by specifying METHOD=TURNBULL in the ICLIFETEST statement. The Lagrange multipliers are displayed in the "Nonparametric Survival Estimates" table.

Groeneboom and Wellner (1992) propose using the iterative convex minorant (ICM) algorithm to estimate the underlying survival function as an alternative to Turnbull’s method. Define , as the cumulative probability at the right boundary of the jth Turnbull interval: . It follows that . Denote and . You can rewrite the likelihood function as

upper L left-parenthesis bold-italic beta right-parenthesis equals product Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline left-parenthesis beta Subscript j Baseline minus beta Subscript j minus 1 Baseline right-parenthesis

Maximizing the likelihood with respect to the ’s is equivalent to maximizing it with respect to the ’s. Because the ’s are naturally ordered, the optimization is subject to the following constraint:

upper C equals StartSet bold x equals left-parenthesis beta 1 comma ellipsis comma beta Subscript m minus 1 Baseline right-parenthesis colon 0 less-than-or-equal-to beta 1 less-than-or-equal-to midline-horizontal-ellipsis less-than-or-equal-to beta Subscript m minus 1 Baseline less-than-or-equal-to 1 EndSet

Denote the log-likelihood function as . Suppose its maximum occurs at . Mathematically, it can be proved that equals the maximizer of the following quadratic function:

g Superscript asterisk Baseline left-parenthesis bold x vertical-bar bold y comma bold upper W right-parenthesis equals minus one-half left-parenthesis bold x minus bold y right-parenthesis prime bold upper W left-parenthesis bold x minus bold y right-parenthesis

where , denotes the derivatives of with respect to , and is a positive definite matrix of size (Groeneboom and Wellner 1992).

An iterative algorithm is needed to determine . For the lth iteration, the algorithm updates the quantity

bold y Superscript left-parenthesis l right-parenthesis Baseline equals ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline minus bold upper W Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline right-parenthesis nabla l left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis l minus 1 right-parenthesis Baseline right-parenthesis

where is the parameter estimate from the previous iteration and is a positive definite diagonal matrix that depends on .

A convenient choice for is the negative of the second-order derivative of the log-likelihood function :

w Subscript j Baseline equals w Subscript j Baseline left-parenthesis bold-italic beta right-parenthesis equals minus StartFraction partial-differential squared Over partial-differential beta Subscript j Superscript 2 Baseline EndFraction l left-parenthesis bold-italic beta right-parenthesis

Given and , the parameter estimate for the lth iteration maximizes the quadratic function .

Define the cumulative sum diagram as a set of m points in the plane, where and

upper P Subscript k Baseline equals left-parenthesis sigma-summation Underscript i equals 1 Overscript k Endscripts w Subscript i Baseline comma sigma-summation Underscript i equals 1 Overscript k Endscripts w Subscript i Baseline y Subscript i Superscript left-parenthesis l right-parenthesis Baseline right-parenthesis

Technically, equals the left derivative of the convex minorant, or in other words, the largest convex function below the diagram . This optimization problem can be solved by the pool-adjacent-violators algorithm (Groeneboom and Wellner 1992).

Occasionally, the ICM step might not increase the likelihood. Jongbloed (1998) suggests conducting a line search to ensure that positive increments are always achieved. Alternatively, you can switch to the EM step, exploiting the fact that the EM iteration never decreases the likelihood, and then resume iterations of the ICM algorithm after the EM step. As with Turnbull’s method, convergence can be determined based on the closeness of two consecutive sets of parameter values or likelihood values. You can use the ICM algorithm by specifying METHOD=ICM in the PROC ICLIFETEST statement.

As its name suggests, the EMICM algorithm combines the self-consistent EM algorithm and the ICM algorithm by alternating the two different steps in its iterations. Wellner and Zhan (1997) show that the converged values of the EMICM algorithm always constitute an MLE if it exists and is unique. The ICLIFETEST procedure uses the EMICM algorithm as the default.

Variance Estimation of the Survival Estimator

Peto (1973) and Turnbull (1976) suggest estimating the variances of the survival estimates by inverting the Hessian matrix, which is obtained by twice differentiating the log-likelihood function. This method can become less stable when the number of ’s increase as n increases. Simulations have shown that the confidence limits based on variances estimated with this method tend to have conservative coverage probabilities that are greater than the nominal level (Goodall, Dunn, and Babiker 2004).

Sun (2001) proposes using two resampling techniques, simple bootstrap and multiple imputation, to estimate the variance of the survival estimator. The undefined regions that the Turnbull intervals represent create a special challenge using the bootstrap method. Because each bootstrap sample could have a different set of Turnbull intervals, some time points to evaluate the variances based on the original Turnbull intervals might be located within the intervals in a bootstrap sample, with the result that their survival probabilities become unknown. A simple ad hoc solution is to shrink the Turnbull interval to its right boundary and modify the survival estimates into a right continuous function:

ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis equals sigma-summation Underscript j colon p Subscript j Baseline greater-than t Endscripts ModifyingAbove theta With caret Subscript j

Let M denote the number of resampling data sets. Let denote the n independent samples from the original data with replacement, . Let be the modified estimate of the survival function computed from the kth resampling data set. Then you can estimate the variance of by the sample variance as

ModifyingAbove sigma With caret Subscript b Superscript 2 Baseline left-parenthesis t right-parenthesis equals StartFraction 1 Over upper M minus 1 EndFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts left-bracket ModifyingAbove upper S With caret Subscript m Superscript k Baseline left-parenthesis t right-parenthesis minus ModifyingAbove upper S With bar Subscript m Baseline left-parenthesis t right-parenthesis right-bracket squared

where

ModifyingAbove upper S With bar Subscript m Baseline left-parenthesis t right-parenthesis equals StartFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts ModifyingAbove upper S With caret Subscript m Superscript k Baseline left-parenthesis t right-parenthesis Over upper M EndFraction

The method of multiple imputations exploits the fact that interval-censored data reduce to right-censored data when all interval observations of finite length shrink to single points. Suppose that each finite interval has been converted to one of the values it contains. For this right-censored data set, you can estimate the variance of the survival estimates via the well-known Greenwood formula as

ModifyingAbove sigma With caret Subscript upper G Superscript 2 Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper S With caret Subscript upper K upper M Superscript 2 Baseline left-parenthesis t right-parenthesis sigma-summation Underscript q Subscript j Baseline less-than t Endscripts StartFraction d Subscript j Baseline Over n Subscript j Baseline left-parenthesis n Subscript j Baseline minus d Subscript j Baseline right-parenthesis EndFraction

where is the number of events at time and is the number of subjects at risk just prior to , and is the Kaplan-Meier estimator of the survival function,

ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t right-parenthesis equals product Underscript q Subscript j Baseline less-than t Endscripts StartFraction n Subscript j Baseline minus d Subscript j Baseline Over n Subscript j Baseline EndFraction

Essentially, multiple imputation is used to account for the uncertainty of ranking overlapping intervals. The kth imputed data set is obtained by substituting every interval-censored observation of finite length with an exact event time randomly drawn from the conditional survival function:

ModifyingAbove upper S With caret Subscript i Baseline left-parenthesis t right-parenthesis equals StartFraction ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis minus ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis upper R Subscript i Baseline plus right-parenthesis Over ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis upper R Subscript i Baseline plus right-parenthesis EndFraction comma t element-of left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket

Because only jumps at the , this is a discrete function.

Denote the Kaplan-Meier estimate of each imputed data set as . The variance of is estimated by

ModifyingAbove sigma With caret Subscript upper I Superscript 2 Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper S With caret squared left-parenthesis t right-parenthesis sigma-summation Underscript q Subscript j Baseline less-than t Endscripts StartFraction d prime Subscript j Baseline Over n prime Subscript j Baseline left-parenthesis n prime Subscript j Baseline minus d prime Subscript j right-parenthesis EndFraction plus StartFraction 1 Over upper M minus 1 EndFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts left-bracket ModifyingAbove upper S With caret Subscript upper K upper M Superscript k Baseline left-parenthesis t right-parenthesis minus ModifyingAbove upper S With bar Subscript upper K upper M Baseline left-parenthesis t right-parenthesis right-bracket

where

ModifyingAbove upper S With bar Subscript upper K upper M Baseline left-parenthesis t right-parenthesis equals StartFraction 1 Over upper M EndFraction sigma-summation Underscript k equals 1 Overscript upper M Endscripts ModifyingAbove upper S With caret Subscript upper K upper M Superscript k Baseline left-parenthesis t right-parenthesis

and

d prime Subscript j Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction alpha Subscript i j Baseline left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket Over sigma-summation Underscript j equals 1 Overscript m Endscripts alpha Subscript i j Baseline left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket EndFraction

and

n prime Subscript j Baseline equals sigma-summation Underscript k equals j Overscript m Endscripts d prime Subscript j

Note that the first term in the formula for mimics the Greenwood formula but uses expected numbers of deaths and subjects. The second term is the sample variance of the Kaplan-Meier estimates of imputed data sets, which accounts for between-imputation contributions.

Pointwise Confidence Limits of the Survival Function

Pointwise confidence limits can be computed for the survival function given the estimated standard errors. Let be specified by the ALPHA= option. Let be the critical value for the standard normal distribution. That is, , where is the cumulative distribution function of the standard normal random variable.

Constructing the confidence limits for the survival function as might result in an estimate that exceeds the range [0,1] at extreme values of t. This problem can be avoided by applying a transformation to so that the range is unrestricted. In addition, certain transformed confidence intervals for perform better than the usual linear confidence intervals (Borgan and Liestøl 1990). You can use the CONFTYPE= option to set one of the following transformations: the log-log function (Kalbfleisch and Prentice 1980), the arcsine–square root function (Nair 1984), the logit function (Meeker and Escobar 1998), the log function, and the linear function.

Let g denote the transformation that is being applied to the survival function . Using the delta method, you estimate the standard error of by

tau left-parenthesis t right-parenthesis equals ModifyingAbove sigma With caret left-bracket g left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis right-bracket equals g prime left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket

where g’ is the first derivative of the function g. The 100(1 – )% confidence interval for is given by

g Superscript negative 1 Baseline StartSet g left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket plus-or-minus z Subscript StartFraction alpha Over 2 EndFraction Baseline g prime left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket ModifyingAbove sigma With caret left-bracket ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-bracket EndSet

where is the inverse function of g. The choices for the transformation g are as follows:

arcsine–square root transformation: The estimated variance of is The 100(1 – )% confidence interval for is given by
linear transformation: This is the same as the identity transformation. The 100(1 – )% confidence interval for is given by
log transformation: The estimated variance of is The 100(1 – )% confidence interval for is given by
log-log transformation: The estimated variance of is The 100(1 – )% confidence interval for is given by
logit transformation: The estimated variance of is

The 100(1 – )% confidence limits for are given by

Quartile Estimation

The first quartile (25th percentile) of the survival time is the time beyond which 75% of the subjects in the population under study are expected to survive. For interval-censored data, it is problematic to define point estimators of the quartiles based on the survival estimate because of its undefined regions of Turnbull intervals. To overcome this problem, you need to impute survival probabilities within the Turnbull intervals. The previously defined estimator achieves this by placing all the estimated probabilities at the right boundary of the interval. The first quartile is estimated by

q .25 equals normal m normal i normal n StartSet t Subscript j Baseline vertical-bar ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t Subscript j Baseline right-parenthesis less-than 0.75 EndSet

If is exactly equal to 0.75 from to , the first quartile is taken to be . If is greater than 0.75 for all values of t, the first quartile cannot be estimated and is represented by a missing value in the printed output.

The general formula for estimating the 100p percentile point is

q Subscript p Baseline equals normal m normal i normal n StartSet t Subscript j Baseline vertical-bar ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t Subscript j Baseline right-parenthesis less-than 1 minus p EndSet

The second quartile (the median) and the third quartile of survival times correspond to p = 0.5 and p = 0.75, respectively.

Brookmeyer and Crowley (1982) constructed the confidence interval for the median survival time based on the confidence interval for the survival function . The methodology is generalized to construct the confidence interval for the 100p percentile based on a g-transformed confidence interval for (Klein and Moeschberger 1997). You can use the CONFTYPE= option to specify the g-transformation. The % confidence interval for the first quantile survival time is the set of all points t that satisfy

StartAbsoluteValue StartFraction g left-parenthesis ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis right-parenthesis minus g left-parenthesis 1 minus 0.25 right-parenthesis Over g prime left-parenthesis ModifyingAbove upper S With caret Subscript m Baseline left-parenthesis t right-parenthesis right-parenthesis ModifyingAbove sigma With caret left-parenthesis ModifyingAbove upper S With caret left-parenthesis t right-parenthesis right-parenthesis EndFraction EndAbsoluteValue less-than-or-equal-to z Subscript 1 minus StartFraction alpha Over 2 EndFraction

where is the first derivative of and is the percentile of the standard normal distribution.

Kernel-Smoothed Estimation

After you obtain the survival estimate , you can construct a discrete estimator for the cumulative hazard function. First, you compute the jumps of the discrete function as

ModifyingAbove lamda With caret Subscript j Baseline equals StartFraction c Subscript j Baseline ModifyingAbove theta With caret Subscript j Baseline Over sigma-summation Underscript k equals j Overscript m Endscripts c Subscript k Baseline ModifyingAbove theta With caret Subscript k Baseline EndFraction comma j equals 1 comma ellipsis comma m

where the ’s have been defined previously for calculating the Lagrange multiplier statistic.

Essentially, the numerator and denominator estimate the number of failures and the number at risks that are associated with the Turnbull intervals. Thus these quantities estimate the increments of the cumulative hazard function over the Turnbull intervals.

The estimator of the cumulative hazard function is

ModifyingAbove lamda With caret left-parenthesis t right-parenthesis equals sigma-summation Underscript k colon p Subscript k Baseline less-than t Endscripts ModifyingAbove lamda With caret Subscript k Baseline comma t not-an-element-of normal a normal n normal y upper I Subscript j Baseline

Like , is undefined if t is located within some Turnbull interval . To facilitate applying the kernel-smoothed methods, you need to reformulate the estimator so that it has only point masses. An ad hoc approach would be to place all the mass for a Turnbull interval at the right boundary. The kernel-based estimate of the hazard function is computed as

ModifyingAbove h With tilde left-parenthesis t comma b right-parenthesis equals minus StartFraction 1 Over b EndFraction sigma-summation Underscript j equals 1 Overscript m Endscripts upper K left-parenthesis StartFraction t minus p Subscript j Baseline Over b EndFraction right-parenthesis ModifyingAbove lamda With caret Subscript j

where is a kernel function and is the bandwidth. You can estimate the cumulative hazard function by integrating with respect to t.

Practically, an upper limit is usually imposed so that the kernel-smoothed estimate is defined on . The ICLIFETEST procedure sets the value depending on whether the right boundary of the last Turnbull interval is finite or not: if and otherwise.

Typical choices of kernel function are as follows:

uniform kernel:
Epanechnikov kernel:
biweight kernel:

For t < b, the symmetric kernels are replaced by the corresponding asymmetric kernels of Gasser and Müller (1979). Let . The modified kernels are as follows:

uniform kernel:
Epanechnikov kernel:
biweight kernel:

For , let . The asymmetric kernels for are used, with x replaced by –x.

The bandwidth parameter b controls how much “smoothness” you want to have in the kernel-smoothed estimate. For right-censored data, a commonly accepted method of choosing an optimal bandwidth is to use the mean integrated square error(MISE) as an objective criteria. This measure becomes difficult to adapt to interval-censored data because it no longer has a closed-form mathematical formula.

Pan (2000) proposes using a V-fold cross validation likelihood as a criterion for choosing the optimal bandwidth for the kernel-smoothed estimate of the survival function. The ICLIFETEST procedure implements this approach for smoothing the hazard function. Computing such a criterion entails a cross validation type procedure. First, the original data are partitioned into V almost balanced subsets , . Denote the kernel-smoothed estimate of the leave-one-subset-out data as . The optimal bandwidth is defined as the one that maximizes the cross validation likelihood:

b 0 equals argmax Subscript StartLayout 1st Row b EndLayout Baseline sigma-summation Underscript v equals 1 Overscript upper V Endscripts upper L left-parenthesis ModifyingAbove h With caret Superscript asterisk left-parenthesis negative v right-parenthesis Baseline left-parenthesis t semicolon b right-parenthesis vertical-bar script upper D Superscript left-parenthesis v right-parenthesis Baseline right-parenthesis

Comparison of Survival between Groups

If the TEST statement is specified, the ICLIFETEST procedure compares the K groups formed by the levels of the TEST variable using a generalized log-rank test. Let be the underlying survival function of the kth group, . The null and alternative hypotheses to be tested are

for all t

versus

at least one of the ’s is different for some t

Let denote the number of subjects in group k, and let n denote the total number of subjects ().

Generalized Log-Rank Statistic

For the ith subject, let be a vector of K indicators that represent whether or not the subject belongs to the kth group. Denote , where represents the treatment effect for the kth group. Suppose that a model is specified and the survival function for the ith subject can be written as

upper S left-parenthesis t vertical-bar bold z Subscript i Baseline right-parenthesis equals upper S left-parenthesis t vertical-bar bold z prime Subscript i Baseline bold-italic beta comma bold-italic gamma right-parenthesis

where denotes the nuisance parameters.

It follows that the likelihood function is

upper L equals product Underscript i equals 1 Overscript n Endscripts left-bracket upper S left-parenthesis upper L Subscript i Baseline vertical-bar bold z prime Subscript i Baseline bold-italic beta comma bold-italic gamma right-parenthesis minus upper S left-parenthesis upper R Subscript i Baseline vertical-bar bold z prime Subscript i Baseline bold-italic beta comma bold-italic gamma right-parenthesis right-bracket

where denotes the interval observation for the ith subject.

Testing whether or not the survival functions are equal across the K groups is equivalent to testing whether all the ’s are zero. It is natural to consider a score test based on the specified model (Finkelstein 1986).

The score statistics for are derived as the first-order derivatives of the log-likelihood function evaluated at and .

bold upper U equals left-parenthesis upper U 1 comma ellipsis comma upper U Subscript upper K Baseline right-parenthesis prime equals StartFraction partial-differential log left-parenthesis upper L right-parenthesis Over partial-differential bold-italic beta EndFraction vertical-bar Subscript bold-italic beta equals bold 0 comma ModifyingAbove bold-italic gamma With caret Baseline

where denotes the maximum likelihood estimate for the , given that .

Under the null hypothesis that , all K groups share the same survival function . It is typical to leave unspecified and obtain a nonparametric maximum likelihood estimate using, for instance, Turnbull’s method. In this case, represents all the parameters to be estimated in order to determine .

Suppose the given data generates m Turnbull intervals as . Denote the probability estimate at the right end point of the jth interval by . The nonparametric survival estimate is for any .

Under the null hypothesis, Fay (1999) showed that the score statistics can be written in the form of a weighted log-rank test as

upper U Subscript k Baseline equals sigma-summation Underscript j equals 1 Overscript m Endscripts upper U Subscript k j Baseline equals sigma-summation Underscript j equals 1 Overscript m Endscripts v Subscript j Baseline left-parenthesis d prime Subscript k j Baseline minus StartFraction n prime Subscript k j Baseline Over n prime Subscript j EndFraction d prime Subscript j right-parenthesis

where

v Subscript j Baseline equals StartFraction left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis minus ModifyingAbove upper S With caret prime left-parenthesis p Subscript j minus 1 Baseline right-parenthesis right-bracket left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret prime left-parenthesis p Subscript j Baseline right-parenthesis right-bracket Over ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis left-bracket ModifyingAbove upper S With caret left-parenthesis p Subscript j minus 1 Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis p Subscript j Baseline right-parenthesis right-bracket EndFraction

and denotes the derivative of with respect to .

estimates the expected number of events within for the kth group, and it is computed as

d prime Subscript k j Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts z Subscript i k Baseline StartFraction alpha Subscript i j Baseline ModifyingAbove theta With caret Subscript j Baseline Over sigma-summation Underscript l equals 1 Overscript m Endscripts alpha Subscript i l Baseline ModifyingAbove theta Subscript l Baseline With caret EndFraction

is an estimate for the expected number of events within for the whole sample, and it is computed as

d prime Subscript j Baseline equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts d prime Subscript k j

Similarly, estimates the expected number of subjects at risk before entering for the kth group, and can be estimated by . is an estimate of the expected number of subjects at risk before entering for all the groups: .

Assuming different survival models gives rise to different weight functions (Fay 1999). For example, Finkelstein’s score test (1986) is derived assuming a proportional hazards model; Fay’s test (1996) is based on a proportional odds model.

The choices of weight function are given in Table 3.

Table 3: Weight Functions for Various Tests

Test
Sun (1996)	1.0
Fay (1999)
Finkelstein (1986)
Harrington-Fleming (p,q)

Variance Estimation of the Generalized Log-Rank Statistic

Sun (1996) proposed the use of multiple imputation to estimate the variance-covariance matrix of the generalized log-rank statistic . This approach is similar to the multiple imputation method as presented in Variance Estimation of the Survival Estimator. Both methods impute right-censored data from interval-censored data and analyze the imputed data sets by using standard statistical techniques. Huang, Lee, and Yu (2008) suggested improving the performance of the generalized log-rank test by slightly modifying the variance calculation.

Suppose the given data generate m Turnbull intervals as . Denote the probability estimate for the jth interval as , and denote the nonparametric survival estimate as for any .

In order to generate an imputed data set, you need to randomly generate a survival time for every subject of the sample. For the ith subject, a random time is generated randomly based on the following discrete survival function:

ModifyingAbove upper S With caret Subscript i Baseline left-parenthesis upper T Subscript i Superscript asterisk Baseline equals p Subscript j Baseline right-parenthesis equals StartFraction ModifyingAbove upper S With caret left-parenthesis q Subscript j Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline plus right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline plus right-parenthesis EndFraction comma p Subscript j Baseline element-of left-parenthesis upper L Subscript i Baseline comma upper R Subscript i Baseline right-bracket comma j equals 1 comma ellipsis comma m

where denotes the interval observation for the subject.

For the hth imputed data set (), let and denote the numbers of failures and subjects at risk by counting the imputed ’s for group k. Let and denote the corresponding pooled numbers.

You can perform the standard weighted log-rank test for right-censored data on each of the imputed data sets (Huang, Lee, and Yu 2008). The test statistic is

bold upper U Superscript h Baseline equals left-parenthesis upper U 1 Superscript h Baseline comma ellipsis comma upper U Subscript upper K Superscript h Baseline right-parenthesis prime

where

upper U Subscript k Superscript h Baseline equals sigma-summation Underscript j equals 1 Overscript m Endscripts v Subscript j Baseline left-parenthesis d Subscript k j Superscript h Baseline minus StartFraction n Subscript k j Superscript h Baseline Over n Subscript j Superscript h Baseline EndFraction d Subscript j Superscript h Baseline right-parenthesis

Its variance-covariance matrix is estimated by the Greenwood formula as

bold upper V Superscript h Baseline equals bold upper V 1 Superscript h Baseline plus midline-horizontal-ellipsis plus bold upper V Subscript m Superscript h

where

left-parenthesis bold upper V Subscript j Superscript h Baseline right-parenthesis Subscript l 1 l 2 Baseline equals StartLayout Enlarged left-brace 1st Row v Subscript j Superscript 2 Baseline n Subscript l 1 j Superscript h Baseline left-parenthesis n Subscript j Superscript h Baseline minus n Subscript l 1 j Superscript h Baseline d Subscript j Superscript h Baseline left-parenthesis n Subscript j Superscript h Baseline minus d Subscript j Superscript h Baseline right-parenthesis left-parenthesis n Subscript j Superscript h Baseline right-parenthesis Superscript negative 2 Baseline left-parenthesis n Subscript j Superscript h Baseline minus 1 right-parenthesis Superscript negative 1 Baseline right-parenthesis when l 1 equals l 2 2nd Row minus v Subscript j Superscript 2 Baseline n Subscript l 1 j Superscript h Baseline n Subscript l 2 j Superscript h Baseline d Subscript j Superscript h Baseline left-parenthesis n Subscript j Superscript h Baseline minus d Subscript j Superscript h Baseline right-parenthesis left-parenthesis n Subscript j Superscript h Baseline right-parenthesis Superscript negative 2 Baseline left-parenthesis n Subscript j Superscript h Baseline minus 1 right-parenthesis Superscript negative 1 Baseline when l 1 not-equals l 2 EndLayout

After analyzing each imputed data set, you can estimate the variance-covariance matrix of by pooling the results as

ModifyingAbove bold upper V With caret equals StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts bold upper V Superscript h Baseline minus StartFraction 1 Over upper H minus 1 EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts left-bracket bold upper U Superscript h Baseline minus bold upper U overbar right-bracket left-bracket bold upper U Superscript h Baseline minus bold upper U overbar right-bracket prime

where

bold upper U overbar equals StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts bold upper U Superscript h

The overall test statistic is formed as , where is the generalized inverse of . Under the null hypothesis, the statistic has a chi-squared distribution with degrees of freedom equal to the rank of . By default, the ICLIFETEST procedure perform 1000 imputations. You can change the number of imputations by the IMPUTE option in the PROC ICLIFETEST statement.

Stratified Tests

Suppose the generalized log-rank test is to be stratified on the M levels that are formed from the variables that you specify in the STRATA statement. Based only on the data of the sth stratum (), let be the test statistic for the sth stratum and let be the corresponding covariance matrix as constructed in the section Variance Estimation of the Generalized Log-Rank Statistic. First, sum over the stratum-specific estimates as follows:

bold upper U period equals sigma-summation Underscript s equals 1 Overscript upper M Endscripts bold upper U Subscript left-parenthesis s right-parenthesis Baseline

bold upper V period equals sigma-summation Underscript s equals 1 Overscript upper M Endscripts bold upper V Subscript left-parenthesis s right-parenthesis Baseline

Then construct the global test statistic as

bold upper U period prime bold upper V period Superscript minus Baseline bold upper U period

Under the null hypothesis, the test statistic has a chi-squared distribution with degrees of freedom equal to the rank of . The ICLIFETEST procedure performs the stratified test only when the groups to be compared are balanced across all the strata.

Multiple-Comparison Adjustments

When you have more than two groups, a generalized log-rank test tells you whether the survival curves are significantly different from each other, but it does not identify which pairs of curves are different. Pairwise comparisons can be performed based on the generalized log-rank statistic and the corresponding variance-covariance matrix. However, reporting all pairwise comparisons is problematic because the overall Type I error rate would be inflated. A multiple-comparison adjustment of the p-values for the paired comparisons retains the same overall probability of a Type I error as the K-sample test.

The ICLIFETEST procedure supports two types of paired comparisons: comparisons between all pairs of curves and comparisons between a control curve and all other curves. You use the DIFF= option to specify the comparison type, and you use the ADJUST= option to select a method of multiple-comparison adjustments.

Let denote a chi-square random variable with r degrees of freedom. Denote and as the density function and the cumulative distribution function of a standard normal distribution, respectively. Let m be the number of comparisons; that is,

StartLayout 1st Row m equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction k left-parenthesis k minus 1 right-parenthesis Over 2 EndFraction 2nd Column normal upper D normal upper I normal upper F normal upper F equals normal upper A normal upper L normal upper L 2nd Row 1st Column k minus 1 2nd Column normal upper D normal upper I normal upper F normal upper F equals normal upper C normal upper O normal upper N normal upper T normal upper R normal upper O normal upper L EndLayout EndLayout

For a two-sided test that compares the survival of the jth group with that of lth group, , the test statistic is

z Subscript j l Superscript 2 Baseline equals StartFraction left-parenthesis upper U Subscript j Baseline minus upper U Subscript l Baseline right-parenthesis squared Over upper V Subscript j j Baseline plus upper V Subscript l l Baseline minus 2 upper V Subscript j l Baseline EndFraction

and the raw p-value is

p equals normal upper P normal r left-parenthesis chi 1 squared greater-than z Subscript j l Superscript 2 Baseline right-parenthesis

For multiple comparisons of more than two groups (), adjusted p-values are computed as follows:

Bonferroni adjustment:
Dunnett-Hsu adjustment: With the first group defined as the control, there are comparisons to be made. Let be the matrix of contrasts that represents the comparisons; that is,

Let and be covariance and correlation matrices of , respectively; that is,

and

The factor-analytic covariance approximation of Hsu (1992) is to find such that

where is a diagonal matrix whose jth diagonal element is and . The adjusted p-value is

This value can be obtained in a DATA step as
Scheffé adjustment:
Šidák adjustment:
SMM adjustment:

This can also be evaluated in a DATA step as
Tukey adjustment:

This can be evaluated in a DATA step as

Trend Tests

Trend tests for right-censored data (Klein and Moeschberger 1997, Section 7.4) can be extended to interval-censored data in a straightforward way. Such tests are specifically designed to detect ordered alternatives as

with at least one inequality

Let be a sequence of scores associated with the K samples. Let be the generalized log-rank statistic and be the corresponding covariance matrix of size as constructed in the section Variance Estimation of the Generalized Log-Rank Statistic. The trend test statistic and its variance are given by and , respectively. Under the null hypothesis that there is no trend, the following z-score has, asymptotically, a standard normal distribution:

upper Z equals StartFraction sigma-summation Underscript j equals 1 Overscript upper K Endscripts a Subscript j Baseline upper U Subscript j Baseline Over StartRoot left-brace EndRoot sigma-summation Underscript j equals 1 Overscript upper K Endscripts sigma-summation Underscript l equals 1 Overscript upper K Endscripts a Subscript j Baseline a Subscript l Baseline upper V Subscript j l Baseline right-brace EndFraction

The ICLIFETEST procedure provides both one-tail and two-tail p-values for the test.

Scores for Permutation Tests

The weighted log-rank statistic can also be expressed as

bold upper U equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold z Subscript i Baseline c Subscript i

where is the score from the ith subject and follows the form

c Subscript i Baseline equals StartFraction ModifyingAbove upper S With caret prime left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret prime left-parenthesis upper R Subscript i Baseline right-parenthesis Over ModifyingAbove upper S With caret left-parenthesis upper L Subscript i Baseline right-parenthesis minus ModifyingAbove upper S With caret left-parenthesis upper R Subscript i Baseline right-parenthesis EndFraction

where denotes the derivative of with respect to , which is evaluated at .

As presented in Table 4, Fay (1999) derives the forms of scores for three weight functions. Under the assumption that censoring is independent of the grouping of subjects, these derived scores can be used by permutation tests.

Table 4: Scores for Different Weight Functions

Test Weight
Sun (1996)
Fay (1999)
Finkelstein (1986)

You can output scores to a designated SAS data set by specifying the OUTSCORE= option in the TEST statement.

Last updated: December 09, 2022