The LIFEREG Procedure

Probability Plotting

Probability plots are useful tools for the display and analysis of lifetime data. Probability plots use an inverse distribution scale so that a cumulative distribution function (CDF) plots as a straight line. A nonparametric estimate of the CDF of the lifetime data will plot approximately as a straight line, thus providing a visual assessment of goodness of fit.

You can use the PROBPLOT statement in PROC LIFEREG to create probability plots of data that are complete, right censored, interval censored, or a combination of censoring types (arbitrarily censored). A line representing the maximum likelihood fit from the MODEL statement and pointwise parametric confidence bands for the cumulative probabilities are also included in the plot.

A random variable Y belongs to a location-scale family of distributions if its CDF F is of the form

probability left-brace upper Y less-than-or-equal-to y right-brace equals upper F left-parenthesis y right-parenthesis equals upper G left-parenthesis StartFraction y minus mu Over sigma EndFraction right-parenthesis

where mu is the location parameter and sigma is the scale parameter. Here, G is a CDF that cannot depend on any unknown parameters, and G is the CDF of Y if mu equals 0 and sigma equals 1. For example, if Y is a normal random variable with mean mu and standard deviation sigma,

upper G left-parenthesis u right-parenthesis equals normal upper Phi left-parenthesis u right-parenthesis equals integral Subscript negative normal infinity Superscript u Baseline StartFraction 1 Over StartRoot 2 pi EndRoot EndFraction exp left-parenthesis minus StartFraction u squared Over 2 EndFraction right-parenthesis d u

and

upper F left-parenthesis y right-parenthesis equals normal upper Phi left-parenthesis StartFraction y minus mu Over sigma EndFraction right-parenthesis

The normal, extreme-value, and logistic distributions are location-scale models. The three-parameter gamma distribution is a location-scale model if the shape parameter delta is fixed. If T has a lognormal, Weibull, or log-logistic distribution, then log left-parenthesis upper T right-parenthesis has a distribution that is a location-scale model. These distributions are said to be of type log-location-scale. Probability plots are constructed for lognormal, Weibull, and log-logistic distributions by using log left-parenthesis upper T right-parenthesis instead of T in the plots.

Let y Subscript left-parenthesis 1 right-parenthesis Baseline less-than-or-equal-to y Subscript left-parenthesis 2 right-parenthesis Baseline less-than-or-equal-to ellipsis less-than-or-equal-to y Subscript left-parenthesis n right-parenthesis be ordered observations of a random sample with distribution function upper F left-parenthesis y right-parenthesis. A probability plot is a plot of the points y Subscript left-parenthesis i right-parenthesis against m Subscript i Baseline equals upper G Superscript negative 1 Baseline left-parenthesis a Subscript i Baseline right-parenthesis, where a Subscript i Baseline equals ModifyingAbove upper F With caret left-parenthesis y Subscript i Baseline right-parenthesis is an estimate of the CDF upper F left-parenthesis y Subscript left-parenthesis i right-parenthesis Baseline right-parenthesis equals upper G left-parenthesis StartFraction y Subscript left-parenthesis i right-parenthesis Baseline minus mu Over sigma EndFraction right-parenthesis. The nonparametric CDF estimates a Subscript i are sometimes called plotting positions. The axis on which the points m Subscript i are plotted is usually labeled with a probability scale (the scale of a Subscript i).

If F is one of the location-scale distributions, then y is the lifetime; otherwise, the log of the lifetime is used to transform the distribution to a location-scale model.

If the data actually have the stated distribution, then ModifyingAbove upper F With caret almost-equals upper F,

m Subscript i Baseline equals upper G Superscript negative 1 Baseline left-parenthesis ModifyingAbove upper F With caret left-parenthesis y Subscript i Baseline right-parenthesis right-parenthesis almost-equals upper G Superscript negative 1 Baseline left-parenthesis upper G left-parenthesis StartFraction y Subscript left-parenthesis i right-parenthesis Baseline minus mu Over sigma EndFraction right-parenthesis right-parenthesis equals StartFraction y Subscript left-parenthesis i right-parenthesis Baseline minus mu Over sigma EndFraction

and points left-parenthesis y Subscript left-parenthesis i right-parenthesis Baseline comma m Subscript i Baseline right-parenthesis should fall approximately in a straight line.

There are several ways to compute the nonparametric CDF estimates used in probability plots from lifetime data. These are discussed in the next two sections.

Complete and Right-Censored Data

The censoring times must be taken into account when you compute plotting positions for right-censored data. The modified Kaplan-Meier method described in the following section is the default method for computing nonparametric CDF estimates for display on probability plots. See Abernethy (1996), Meeker and Escobar (1998), and Nelson (1982) for discussions of the methods described in the following sections.

Expected Ranks, Kaplan-Meier, and Modified Kaplan-Meier Methods

Let y Subscript left-parenthesis 1 right-parenthesis Baseline less-than-or-equal-to y Subscript left-parenthesis 2 right-parenthesis Baseline less-than-or-equal-to ellipsis less-than-or-equal-to y Subscript left-parenthesis n right-parenthesis be ordered observations of a random sample including failure times and censor times. Order the data in increasing order. Label all the data with reverse ranks r Subscript i, with r 1 equals n comma ellipsis comma r Subscript n Baseline equals 1. For the lifetime (not censoring time) corresponding to reverse rank r Subscript i, compute the survival function estimate

upper S Subscript i Baseline equals left-bracket StartFraction r Subscript i Baseline Over r Subscript i Baseline plus 1 EndFraction right-bracket upper S Subscript i minus 1

with upper S 0 equals 1. The expected rank plotting position is computed as a Subscript i Baseline equals 1 minus upper S Subscript i. The option PPOS=EXPRANK specifies the expected rank plotting position.

For the Kaplan-Meier method,

upper S Subscript i Baseline equals left-bracket StartFraction r Subscript i Baseline minus 1 Over r Subscript i Baseline EndFraction right-bracket upper S Subscript i minus 1

The Kaplan-Meier plotting position is then computed as a prime Subscript i Baseline equals 1 minus upper S Subscript i. The option PPOS=KM specifies the Kaplan-Meier plotting position.

For the modified Kaplan-Meier method, use

upper S prime Subscript i Baseline equals StartFraction upper S Subscript i Baseline plus upper S Subscript i minus 1 Baseline Over 2 EndFraction

where upper S Subscript i is computed from the Kaplan-Meier formula with upper S 0 equals 1. The plotting position is then computed as a double-prime Subscript i Baseline equals 1 minus upper S prime Subscript i. The option PPOS=MKM specifies the modified Kaplan-Meier plotting position. If the PPOS option is not specified, the modified Kaplan-Meier plotting position is used as the default method.

For complete samples, a Subscript i Baseline equals i slash left-parenthesis n plus 1 right-parenthesis for the expected rank method, a prime Subscript i Baseline equals i slash n for the Kaplan-Meier method, and a double-prime Subscript i Baseline equals left-parenthesis i minus 0.5 right-parenthesis slash n for the modified Kaplan-Meier method. If the largest observation is a failure for the Kaplan-Meier estimator, then upper F Subscript n Baseline equals 1 and the point is not plotted.

Median Ranks

Let y Subscript left-parenthesis 1 right-parenthesis Baseline less-than-or-equal-to y Subscript left-parenthesis 2 right-parenthesis Baseline less-than-or-equal-to ellipsis less-than-or-equal-to y Subscript left-parenthesis n right-parenthesis be ordered observations of a random sample including failure times and censor times. A failure order number j Subscript i is assigned to the ith failure: j Subscript i Baseline equals j Subscript i minus 1 Baseline plus normal upper Delta, where j 0 equals 0. The increment normal upper Delta is initially 1 and is modified when a censoring time is encountered in the ordered sample. The new increment is computed as

normal upper Delta equals StartFraction left-parenthesis n plus 1 right-parenthesis minus previous failure order number Over 1 plus number of items beyond previous censored item EndFraction

The plotting position is computed for the ith failure time as

a Subscript i Baseline equals StartFraction j Subscript i Baseline minus 0.3 Over n plus 0.4 EndFraction

For complete samples, the failure order number j Subscript i is equal to i, the order of the failure in the sample. In this case, the preceding equation for a Subscript i is an approximation of the median plotting position computed as the median of the ith-order statistic from the uniform distribution on (0, 1). In the censored case, j Subscript i is not necessarily an integer, but the preceding equation still provides an approximation to the median plotting position. The PPOS=MEDRANK option specifies the median rank plotting position.

Arbitrarily Censored Data

The LIFEREG procedure can create probability plots for data that consist of combinations of exact, left-censored, right-censored, and interval-censored lifetimes—that is, arbitrarily censored data. The LIFEREG procedure uses an iterative algorithm developed by Turnbull (1976) to compute a nonparametric maximum likelihood estimate of the cumulative distribution function for the data. Since the technique is maximum likelihood, standard errors of the cumulative probability estimates are computed from the inverse of the associated Fisher information matrix. This algorithm is an example of the expectation-maximization (EM) algorithm. The default initial estimate assigns equal probabilities to each interval. You can specify different initial values with the PROBLIST= option. Convergence is determined if the change in the log likelihood between two successive iterations is less than delta, where the default value of delta is 10 Superscript negative 8. You can specify a different value for delta with the TOLLIKE= option. Iterations will be terminated if the algorithm does not converge after a fixed number of iterations. The default maximum number of iterations is 1000. Some data might require more iterations for convergence. You can specify the maximum allowed number of iterations with the MAXITEM= option in the PROBPLOT statement. The iteration history of the log likelihood is displayed if you specify the ITPRINTEM option. The iteration history of the estimated interval probabilities are also displayed if you specify both options ITPRINTEM and PRINTPROBS.

If an interval probability is smaller than a tolerance (10 Superscript negative 6 by default) after convergence, the probability is set to zero, the interval probabilities are renormalized so that they add to one, and iterations are restarted. Usually the algorithm converges in just a few more iterations. You can change the default value of the tolerance with the TOLPROB= option. You can specify the NOPOLISH option to avoid setting small probabilities to zero and restarting the algorithm.

If you specify the ITPRINTEM option, a table summarizing the Turnbull estimate of the interval probabilities is displayed. The columns labeled "Reduced Gradient" and "Lagrange Multiplier" are used in checking final convergence of the maximum likelihood estimate. The Lagrange multipliers must all be greater than or equal to zero, or the solution is not maximum likelihood. See Gentleman and Geyer (1994) for more details of the convergence checking. Also see Meeker and Escobar (1998, Chapter 3) for more information.

See Example 76.6 for an illustration.

Nonparametric Confidence Intervals

You can use the PPOUT option in the PROBPLOT statement to create a table containing the nonparametric CDF estimates computed by the selected method, Kaplan-Meier CDF estimates, standard errors of the Kaplan-Meier estimator, and nonparametric confidence limits for the CDF. The confidence limits are either pointwise or simultaneous, depending on the value of the NPINTERVALS= option in the PROBPLOT statement. The method used in the LIFEREG procedure for computation of approximate pointwise and simultaneous confidence intervals for cumulative failure probabilities relies on the Kaplan-Meier estimator of the cumulative distribution function of failure time and approximate standard deviation of the Kaplan-Meier estimator. For the case of arbitrarily censored data, the Turnbull algorithm, discussed previously, provides an extension of the Kaplan-Meier estimator. Both the Kaplan-Meier and the Turnbull estimators provide an estimate of the standard error of the CDF estimator, normal s normal e Subscript ModifyingAbove upper F With caret, that is used in computing confidence intervals.

Pointwise Confidence Intervals

Approximate left-parenthesis 1 minus alpha right-parenthesis 100 percent-sign pointwise confidence intervals are computed as in Meeker and Escobar (1998, Section 3.6) as

left-bracket upper F Subscript upper L Baseline comma upper F Subscript upper U Baseline right-bracket equals left-bracket StartFraction ModifyingAbove upper F With caret Over ModifyingAbove upper F With caret plus left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis w EndFraction comma StartFraction ModifyingAbove upper F With caret Over ModifyingAbove upper F With caret plus left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis slash w EndFraction right-bracket

where

w equals exp left-bracket StartFraction z Subscript 1 minus alpha slash 2 Baseline normal s normal e Subscript ModifyingAbove upper F With caret Baseline Over left-parenthesis ModifyingAbove upper F With caret left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis right-parenthesis EndFraction right-bracket

where z Subscript p is the pth quantile of the standard normal distribution.

Simultaneous Confidence Intervals

Approximate left-parenthesis 1 minus alpha right-parenthesis 100 percent-sign simultaneous confidence bands valid over the lifetime interval left-parenthesis t Subscript a Baseline comma t Subscript b Baseline right-parenthesis are computed as the "Equal Precision" case of Nair (1984) and Meeker and Escobar (1998, Section 3.8) as

left-bracket upper F Subscript upper L Baseline comma upper F Subscript upper U Baseline right-bracket equals left-bracket StartFraction ModifyingAbove upper F With caret Over ModifyingAbove upper F With caret plus left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis w EndFraction comma StartFraction ModifyingAbove upper F With caret Over ModifyingAbove upper F With caret plus left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis slash w EndFraction right-bracket

where

w equals exp left-bracket StartFraction e Subscript a comma b comma 1 minus alpha slash 2 Baseline normal s normal e Subscript ModifyingAbove upper F With caret Baseline Over left-parenthesis ModifyingAbove upper F With caret left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis right-parenthesis EndFraction right-bracket

where the factor x equals e Subscript a comma b comma 1 minus alpha slash 2 is the solution of

x exp left-parenthesis minus x squared slash 2 right-parenthesis log left-bracket StartFraction left-parenthesis 1 minus a right-parenthesis b Over left-parenthesis 1 minus b right-parenthesis a EndFraction right-bracket slash StartRoot 8 pi EndRoot equals alpha slash 2

The time interval left-parenthesis t Subscript a Baseline comma t Subscript b Baseline right-parenthesis over which the bands are valid depends in a complicated way on the constants a and b defined in Nair (1984), 0 less-than a less-than b less-than 1. The constants a and b are chosen by default so that the confidence bands are valid between the lowest and highest times corresponding to failures in the case of multiply censored data, or to the lowest and highest intervals for which probabilities are computed for arbitrarily censored data. You can optionally specify a and b directly with the NPINTERVALS=SIMULTANEOUS(a, b) option in the PROBPLOT statement.

Parametric Confidence Intervals

Pointwise parametric confidence bands are displayed in a probability plot, unless you specify the NOCONF option in the PROBPLOT statement. Two kinds of confidence intervals are available for display in a probability plot: confidence limits for the estimated cumulative distribution function (CDF) and confidence limits for estimated distribution percentiles.

Confidence Limits for the Estimated CDF

If the distribution is of type log-location-scale, let y equals log left-parenthesis t right-parenthesis where t is the value of time at which the confidence limits are to be computed. If the distribution is of type location-scale, let y be the value at which you want to evaluate confidence limits for the estimated CDF ModifyingAbove upper F With caret left-parenthesis y right-parenthesis. Let

ModifyingAbove u With caret equals StartFraction bold y minus bold x prime ModifyingAbove bold-italic beta With caret Over ModifyingAbove sigma With caret EndFraction

where the column vector bold x of covariate values is determined by the rules summarized in the section XDATA= Data Set. If an offset variable is specified, the mean of the offset variable values is included in bold x prime bold-italic beta.

The CDF estimate is given by

ModifyingAbove upper F With caret left-parenthesis y right-parenthesis equals upper G left-parenthesis ModifyingAbove u With caret right-parenthesis

where G is the baseline distribution. The approximate standard error of ModifyingAbove upper F With caret left-parenthesis y right-parenthesis is computed as in Meeker and Escobar (1998, Section 8.4.3) as

normal upper S normal upper E Subscript ModifyingAbove upper F With caret Baseline equals StartFraction g left-parenthesis ModifyingAbove u With caret right-parenthesis Over ModifyingAbove sigma With caret EndFraction left-bracket normal upper V normal a normal r left-parenthesis bold x prime ModifyingAbove bold-italic beta With caret right-parenthesis plus 2 ModifyingAbove u With caret normal upper C normal o normal v left-parenthesis bold x prime ModifyingAbove bold-italic beta With caret comma ModifyingAbove sigma With caret right-parenthesis plus ModifyingAbove u With caret squared normal upper V normal a normal r left-parenthesis ModifyingAbove sigma With caret right-parenthesis right-bracket Superscript one-half

where g is the probability density function corresponding to G. Two-sided left-parenthesis 1 minus alpha right-parenthesis times 100 percent-sign confidence limits are given by

left-bracket upper F Subscript upper L Baseline comma upper F Subscript upper U Baseline right-bracket equals left-bracket StartFraction ModifyingAbove upper F With caret Over ModifyingAbove upper F With caret plus left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis times w EndFraction comma StartFraction ModifyingAbove upper F With caret Over ModifyingAbove upper F With caret plus left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis slash w EndFraction right-bracket

where

w equals exp left-bracket StartFraction z Subscript 1 minus alpha slash 2 Baseline normal upper S normal upper E Subscript ModifyingAbove upper F With caret Baseline Over ModifyingAbove upper F With caret left-parenthesis 1 minus ModifyingAbove upper F With caret right-parenthesis EndFraction right-bracket

and z Subscript p is the p times 100 percentile of the standard normal distribution. The quantities normal upper V normal a normal r left-parenthesis bold x prime ModifyingAbove bold-italic beta With caret right-parenthesis, normal upper C normal o normal v left-parenthesis bold x prime ModifyingAbove bold-italic beta With caret comma ModifyingAbove sigma With caret right-parenthesis, and normal upper V normal a normal r left-parenthesis ModifyingAbove sigma With caret right-parenthesis are computed based on the covariance matrix of the estimated parameter vector left-parenthesis ModifyingAbove bold-italic beta With caret comma ModifyingAbove sigma With caret right-parenthesis.

Confidence Limits for Percentiles

If the HCL option is specified in the PROBPLOT statement, confidence limits based on estimated distribution percentiles instead of the default CDF limits are displayed in the probability plot.

For location-scale distributions, the estimated p times 100 percentile of the distribution F is given by

y Subscript p Baseline equals bold x prime ModifyingAbove bold-italic beta With caret plus upper G Superscript negative 1 Baseline left-parenthesis p right-parenthesis ModifyingAbove sigma With caret

where G is the baseline distribution and the column vector bold x of covariate values is determined by the rules summarized in the section XDATA= Data Set. The standard error of y Subscript p is estimated by normal upper S normal upper E Subscript y Baseline equals z prime normal upper Sigma z where bold z equals left-parenthesis bold x prime comma upper G Superscript negative 1 Baseline left-parenthesis p right-parenthesis right-parenthesis prime and normal upper Sigma is the covariance matrix of the parameter estimates left-parenthesis ModifyingAbove bold-italic beta With caret prime comma ModifyingAbove sigma With caret right-parenthesis prime. Two-sided left-parenthesis 1 minus alpha right-parenthesis times 100 percent-sign confidence limits for y Subscript p are given by

left-bracket y Subscript upper L Baseline comma y Subscript upper U Baseline right-bracket equals left-bracket y Subscript p Baseline minus z Subscript 1 minus alpha slash 2 Baseline normal upper S normal upper E Subscript y Baseline comma y Subscript p Baseline plus z Subscript 1 minus alpha slash 2 Baseline normal upper S normal upper E Subscript y Baseline right-bracket

For distributions of type log-location-scale, the confidence limits are computed as

left-bracket t Subscript upper L Baseline equals exp left-parenthesis y Subscript upper L Baseline right-parenthesis comma t Subscript upper U Baseline equals exp left-parenthesis y Subscript upper U Baseline right-parenthesis right-bracket

For example, if T has the Weibull distribution, G is the standardized extreme value distribution, left-bracket y Subscript upper L Baseline comma y Subscript upper U Baseline right-bracket are confidence limits for the p times 100 percentile of the extreme value distribution for log left-parenthesis upper T right-parenthesis, and left-bracket t Subscript upper L Baseline equals exp left-parenthesis y Subscript upper L Baseline right-parenthesis comma t Subscript upper U Baseline equals exp left-parenthesis y Subscript upper U Baseline right-parenthesis right-bracket are confidence limits for the p times 100 percentile of the Weibull distribution for T.

Last updated: December 09, 2022