The PHREG Procedure

PROC PHREG Statement

PROC PHREG <options>;

The PROC PHREG statement invokes the PHREG procedure. Table 1 summarizes the options available in the PROC PHREG statement.

Table 1: PROC PHREG Statement Options

Option	Description
ALPHA=	Specifies the level of significance
ATRISK	Displays a table that contains the number of units and the corresponding number of events in the risk sets
CONCORDANCE	Computes concordance statistics
COVM	Uses the model-based covariance matrix in the analysis
COVOUT	Adds the estimated covariance matrix to the OUTEST= data set
COVSANDWICH	Requests the robust sandwich estimate for the covariance matrix
DATA=	Names the SAS data set to be analyzed
EV	Requests the Schemper-Henderson predictive measures
FAST	Uses a fast algorithm for large data with start/stop input
INEST=	Names the SAS data set that contains initial estimates
MULTIPASS	Recompiles the risk sets
NAMELEN=	Specifies the length of effect names
NOPRINT	Suppresses all displayed output
NOSUMMARY	Suppresses the summary display observation frequencies
OUTEST=	Creates an output SAS data set containing estimates of the regression coefficients
PLOTS=	Controls the plots that are produced through ODS Graphics
ROCOPTIONS	Specifies options for receiver operating characteristic analysis
SIMPLE	Displays simple descriptive statistics
TAU=	Specifies upper time limit for Uno’s concordance statistic and the integrated area under the curve
ZPH	Requests diagnostics based on weighted residuals for checking the proportional hazards assumption

You can specify the following options in the PROC PHREG statement.

ALPHA=number

specifies the level of significance for % confidence intervals. The value number must be between 0 and 1; the default value is 0.05, which results in 95% intervals. This value is used as the default confidence level for limits computed by the BASELINE, BAYES, CONTRAST, HAZARDRATIO, and MODEL statements. You can override this default by specifying the ALPHA= option in the separate statements.

ATRISK

displays a table that contains the number of units at risk at each event time and the corresponding number of events in the risk sets. For example, the following risk set information is displayed if the ATRISK option is specified in the example in the section Getting Started: PHREG Procedure.

Risk Set Information
	Number of Units
Days	At Risk	Event
142	40	1
143	39	1
156	38	1

296	5	2
304	3	1
323	2	1

CONCORDANCE <=method<(options)>>

computes concordance statistics for the model specified in the MODEL statement (unless the NOFIT option is specified) and for each model specified in an ROC statement. For more information, see the section Concordance Statistics. You can specify the following methods:

HARRELL <(SE)>

requests Harrell’s concordance statistic (Harrell et al. 1984). The SE suboption, if specified, produces standard error for the concordance statistic by the method of Kang et al. (2015).

UNO <(uno-options)>

computes the concordance statistic of Uno et al. (2011). You can specify the following uno-options:

SE: computes the standard error for the concordance statistic by using the perturbation resampling approach as described in Uno et al. (2011).
SEED=n: specifies an integer seed for the random number generator to generate the perturbation samples.
ITER=n: specifies the number of perturbation samples to use.
ALPHA=number: specifies the significance level of the confidence interval for the concordance probability.
DIFF: calculates paired differences in the estimated concordance statistics among all identified ROC models. If the SE suboption is also specified, standard errors and confidence limits are also computed.

Specifying the CONCORDANCE option without any suboptions is equivalent to specifying CONCORDANCE=HARRELL.

COVOUT

adds the estimated covariance matrix of the parameter estimates to the OUTEST= data set. The COVOUT option has no effect unless the OUTEST= option is specified.

COVM

requests that the model-based covariance matrix (which is the inverse of the observed information matrix) be used in the analysis if the COVS option is also specified. The COVM option has no effect if the COVS option is not specified.

COVSANDWICH <(AGGREGATE)> COVS <(AGGREGATE)>

requests the robust sandwich estimate of Lin and Wei (1989) for the covariance matrix. When this option is specified, this robust sandwich estimate is used in the Wald tests for testing the global null hypothesis, null hypotheses of individual parameters, and the hypotheses in the CONTRAST and TEST statements. In addition, a modified score test is computed in the testing of the global null hypothesis, and the parameter estimates table has an additional StdErrRatio column, which contains the ratios of the robust estimate of the standard error relative to the corresponding model-based estimate. Optionally, you can specify the keyword AGGREGATE enclosed in parentheses after the COVSANDWICH (or COVS) option, which requests a summing up of the score residuals for each distinct ID pattern in the computation of the robust sandwich covariance estimate. This AGGREGATE option has no effect if the ID statement is not specified.

DATA=SAS-data-set

names the SAS data set that contains the data to be analyzed. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

EV

requests the Schemper-Henderson measure (Schemper and Henderson 2000) of the proportion of variation that is explained by a Cox regression. This measure of explained variation (EV) is the ratio of distance measures between the 1/0 survival processes and the fitted survival curves with and without covariates information. The distance measure is referred to as the predictive inaccuracy, because the smaller the predictive inaccuracy, the better the prediction. When you specify this option, PROC PHREG creates a table that has three columns: one presents the predictive inaccuracy without covariates (D); one presents the predictive inaccuracy with covariates (); and one presents the EV measure, computed according to .

FAST

uses an alternative algorithm to speed up the fitting of the Cox regression for a large data set that has the counting process style of input. Simonsen (2014) has demonstrated the efficiency of this algorithm when the data set contains a large number of observations and many distinct event times. The algorithm requires only one pass through the data to compute the Breslow or Efron partial log-likelihood function and the corresponding gradient and Hessian. PROC PHREG ignores the FAST option if you specify a TIES= option value other than BRESLOW or EFRON, or if you specify programming statements for time-varying covariates. You might not see much improvement in the optimization time if your data set has only a moderate number of observations.

INEST=SAS-data-set

names the SAS data set that contains initial estimates for all the parameters in the model. BY-group processing is allowed in setting up the INEST= data set. For more information, see the section INEST= Input Data Set.

MULTIPASS

requests that, for each Newton-Raphson iteration, PROC PHREG recompile the risk sets that correspond to the event times for the (start,stop) style of response and recomputes the values of the time-dependent variables defined by the programming statements for each observation in the risk sets. If the MULTIPASS option is not specified, PROC PHREG computes all risk sets and all the variable values and saves them in a utility file. The MULTIPASS option decreases required disk space at the expense of increased execution time; however, for very large data, it might actually save time, because it is time-consuming to write and read large utility files. This option has an effect only when the (start,stop) style of response is used or when there are time-dependent explanatory variables.

NAMELEN=n

specifies the length of effect names in tables and output data sets to be n characters, where n is a value between 20 and 200. The default length is 20 characters.

NOPRINT

suppresses all displayed output. Note that this option temporarily disables the Output Delivery System (ODS); for more information about ODS, see Chapter 23, Using the Output Delivery System.

NOSUMMARY

suppresses the summary display of the event and censored observation frequencies.

OUTEST=SAS-data-set

creates an output SAS data set that contains estimates of the regression coefficients. The data set also contains the convergence status and the log likelihood. If you use the COVOUT option, the data set also contains the estimated covariance matrix of the parameter estimators. For more information, see the section OUTEST= Output Data Set.

PLOTS<(global-plot-options)> = plot-request PLOTS<(global-plot-options)> = (plot-request <…<plot-request>>)

controls the plots that are produced through ODS Graphics. Two types of plots can be produced: plots related to the receiver operating characteristic curves (such as AUC, AUCDIFF, and ROC) and the baseline function plots (such as CIF, CUMHAZ, MCF, and SURVIVAL).

For the baseline function plots, each observation in the COVARIATES= data set in the BASELINE statement represents a set of covariates for which a curve is produced for each plot-request and for each stratum. You can use the ROWID= option in the BASELINE statement to specify a variable in the COVARIATES= data set for identifying the curves that are produced for the covariate sets. If the ROWID= option is not specified, the produced curves are identified by the covariate values if there is only a single covariate or by the observation numbers of the COVARIATES= data set if the model has two or more covariates. If the COVARIATES= data set is not specified, a reference set of covariates that consists of the reference levels for the CLASS variables and the average values for the continuous variables is used. For plotting more than one curve, you can use the OVERLAY= global-plot-option to group the curves in separate plots.

When you specify one plot-request, you can omit the parentheses around the plot request. Here are some examples:

plots=survival
plots=(survival auc)

ODS Graphics must be enabled before plots can be requested. For example:

ods graphics on;
proc phreg plots(cl)=survival;
   model Time*Status(0)=X1-X5;
   baseline covariates=One;
run;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 24, Statistical Graphics Using ODS.

The global-plot-options include the following:

CL<=EQTAIL | HPD>

displays the pointwise interval limits for the specified curves. For the classical analysis, CL displays the confidence limits. For the Bayesian analysis, CL=EQTAIL displays the equal-tail credible limits and CL=HPD displays the HPD limits. Specifying just CL in a Bayesian analysis defaults to CL=HPD.

OVERLAY <=overlay-option>

specifies how the curves for the various strata and covariate sets are overlaid. If the STRATA statement is not specified, specifying OVERLAY without any option will overlay the curves for all the covariate sets. The available overlay-options are as follows:

BYGROUP GROUP: overlays, for each stratum, all curves for the covariate sets that have the same GROUP= value in the COVARIATES= data set in the same plot.
INDIVIDUAL IND: displays, for each stratum, a separate plot for each covariate set.
BYROW ROW: displays, for each covariate set, a separate plot containing the curves for all the strata.
BYSTRATUM STRATUM: displays, for each stratum, a separate plot containing the curves for all sets of covariates.

The default is OVERLAY=BYGROUP if the GROUP= option is specified in the BASELINE statement or if the COVARIATES= data set contains the _GROUP_ variable; otherwise the default is OVERLAY=INDIVIDUAL.

TIMERANGE=(<min> <,max>) TIMERANGE=<min> <,max> RANGE=(<min> <,max>) RANGE=<min> <,max>

specifies the range of values on the time axis to clip the display. The min and max values are the lower and upper bounds of the range. By default, min is 0 and max is the largest event time.

You can specify the following plot-requests:

AUC: plots the time-dependent area under the curve (AUC) for each identified model. For a particular time t, the AUC is the area under the receiver operating characteristic (ROC) curve at t. This option is ignored if time points are specified in the AT= suboption in the ROCOPTIONS option.
AUCDIFF: plots the difference between two AUC curves for each pair of identified models. This option is ignored if time points are specified in the AT= suboption in the ROCOPTIONS option.
CIF: plots the estimated cumulative incidence function (CIF) for each set of covariates in the COVARIATES= data set in the BASELINE statement. If the COVARIATES= data set is not specified, the estimated CIF is plotted for the reference set of covariates, which consists of reference levels for the CLASS variables and average values for the continuous variables.
CUMHAZ: plots the estimated cumulative hazard function for each set of covariates in the COVARIATES= data set in the BASELINE statement. If the COVARIATES= data set is not specified, the estimated cumulative hazard function is plotted for the reference set of covariates, which consists of reference levels for the CLASS variables and average values for the continuous variables.
MCF: plots the estimated mean cumulative function for each set of covariates in the COVARIATES= data set in the BASELINE statement. If the COVARIATES= data set is not specified, the estimated mean cumulative function is plotted for the reference set of covariates, which consists of reference levels for the CLASS variables and average values for the continuous variables.
NONE: suppresses all the plots in the procedure. Specifying this option is equivalent to disabling ODS Graphics for the entire procedure.
ROC<(TICK)>: plots the time-dependent receiver operating characteristic (ROC) curves. You must specify the time points at which the ROC curves are calculated by using the AT= suboption in the ROCOPTIONS option. A plot is produced for each time point. Grid lines are displayed for both axes at 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0. By default, the plots are displayed in panels, with each panel containing up to six plots. Tick values are not shown in the panel plots unless you specify the keyword TICK to show the tick values at the grid lines. You can specify the OVERLAY=INDIVIDUAL global-plot-option to display each plot individually.
SURVIVAL: plots the estimated survivor function for each set of covariates in the COVARIATES= data set in the BASELINE statement. If COVARIATES= data set is not specified, the estimated survivor function is plotted for the reference set of covariates, which consists of reference levels for the CLASS variables and average values for the continuous variables.

ROCOPTIONS (options)

specifies options that apply to the analysis of receiver operating characteristic (ROC) curves for the model that is specified in the MODEL statement (unless the NOFIT option is specified) and for each model specified in a ROC statement. You can specify the following options:

AT=number-list

specifies the list of time points at which the ROC curves are calculated. The number-list can be a list of numbers separated by blanks, or of the form 10 to 30 by 5, or a combination of both.

AUC

displays the area under the curve at time points specified in the AT= suboption or at the event times if the AT= suboption is not specified.

AUCDIFF

displays the differences between the AUC functions for each pair of identified models at the time points specified in the AT= suboption or at the event times if no time points are specified.

IAUC

displays the integrated area under the curve (IAUC), computed as a weighted average of the AUC values at all the event times if the TAU= option is not specified or at the event times less than or equal to the value specified in the TAU= option. The weights that are used are jumps of the Kaplan-Meier estimate of the survivor function.

METHOD=method <(options)>

specifies the method to calculate ROC curves and AUC statistics. For more information, see the section Time-Dependent ROC Curves. You can specify the following methods along with any applicable options:

IPCW<(ipcw-options)> UNO<(ipcw-options)>

uses the inverse probability of censoring weighting (IPCW) technique of Uno et al. (2007). Event observations are weighted according to their probabilities of being censored. The following ipcw-options apply only to AUC calculations:

ALPHA=value: specifies the significance level of the confidence interval for the AUC.
CL: computes the pointwise confidence limits for the AUC based on perturbation resampling.
ITER=n: specifies the number of perturbation samples.
SEED=n: specifies an integer seed for the random number generator to generate perturbation samples.

KM

uses the conditional Kaplan-Meier method of Heagerty, Lumley, and Pepe (2000).

NNE <(nne-options)>

uses the nearest-neighbors technique of Heagerty, Lumley, and Pepe (2000). You can specify the following nne-options:

ASYM: uses asymmetric kernels in the estimation.
SPAN=value: specifies the proportion of observations to use in deriving the bandwidth for the kernel-based estimation. By default, SPAN=0.05.

RECURSIVE CHAMBLESS

uses the method of Chambless and Diao (2006). The calculation is recursive, and it is performed on the ordered event times sequentially from the smallest to the largest.

WKKM <(wkkm-options)>

uses the weighted kernel Kaplan-Meier method of Li, Greene, and Hu (2018). You can specify the following wkkm-options:

ASYM: uses asymmetric kernels in the estimation.
SPAN=value: specifies the proportion of observations to use in deriving the bandwidth for the kernel-based estimation. By default, SPAN=0.05.

By default, METHOD=NNE.

OUTAUC=SAS-data-set

names the output data set to contain the data necessary to produce the AUC plot. Each curve is identified by the corresponding model label. For the list of variables in this data set, see the section OUTAUC= Output Data Set in the ROCOPTIONS Option.

OUTROC=SAS-data-set

names the output data set to contain the data necessary to produce the ROC plots. Each ROC curve corresponds to a block of observations that are identified by a time point and a model label. For the list of variables in this data set, see the section OUTROC= Output Data Set in the ROCOPTIONS Option.

SIMPLE

displays simple descriptive statistics (mean, standard deviation, minimum, and maximum) for each explanatory variable in the MODEL statement.

TAU=value

specifies the upper time limit for computing the integrated area under the curve (IAUC) statistic and Uno’s concordance statistic. Only event times that do not exceed the specified value are used in the calculation. The default value is the largest event time.

ZPH<(zph-options)>

requests diagnostics based on the weighted Schoenfeld residuals for checking the proportional hazards assumption (for more information, see the section ZPH Diagnostics). For each predictor, PROC PHREG presents a plot of the time-varying coefficients in addition to a correlation test between the weighted residuals and failure times in a given scale. You can specify the following zph-options:

FIT=NONE | LOESS | SPLINE

displays a fitted smooth curve in a plot of time-varying coefficients. FIT=LOESS displays a loess curve. FIT=SPLINE fits a penalized B-spline curve. If you do not want to display a fitted curve, specify FIT=NONE. By default, FIT=SPLINE.

GLOBAL

computes the global correlation test.

NOPLOT

suppresses the plots of the time-varying coefficients .

NOTEST

suppresses the correlation tests.

OUT=SAS-data-set

names the output data set that contains the time-varying coefficients , one row per event time. The variables that contain have the same names as the predictors. The data set also contains the transformed event times .

TRANSFORM=IDENTITY | KM | LOG | RANK

specifies how the failure times should be transformed in the diagnostic plots and correlation tests. You can choose from the following transformations:

IDENTITY: specify the identity transformation, .
KM: specifies the complement of the Kaplan-Meier estimate transformation, .
LOG: specifies the log transformation, .
RANK: specifies the rank transformation, .

By default, TRANSFORM=RANK.

Last updated: December 09, 2022