The PHREG Procedure

OUTPUT Statement

  • OUTPUT <OUT=SAS-data-set> <keyword=name …keyword=name> </ option>;

The OUTPUT statement creates a new SAS data set that contains all the variables in the input data set and optionally contains variables for the estimated linear predictor and its standard error estimate, survival estimates, residuals, and influence statistics.

The estimated linear predictor and its standard error estimate are computed for all observations in which the explanatory variables have no missing values, even if the observed time is missing; if the observed time is not missing, the predicted probability is computed at the observed time. By adding observations that have a missing censoring indicator to the input data set, you can compute predicted probabilities for new observations or for settings of explanatory variables and observed times that are not present in the data without affecting the model. Alternatively, you can use the BASELINE statement to compute predicted survival probabilities for new observations.

No data set is created for the OUTPUT statement if the model contains any time-dependent variables that are defined by programming statements.

Table 10 summarizes the options available in the OUTPUT statement. The statistic and diagnostic keywords specify the statistics to be included in the output data set and name the new variables that contain the statistics.

Table 10: OUTPUT Statement Options

Option Description
METHOD= Specifies the method to use to estimate the survival probabilities
OUT= Names the output data set
Statistic Keywords
ATRISK= Names the variable that contains the number of subjects at risk
CIF= Names the variable that contains the cumulative incidence probability
LOGLOGS= Names the variable that contains the log of the negative log of survival probability
LOGSURV= Names the variable that contains the log of survival probability
XBETA= Names the variable that contains the linear predictor
STDXBETA= Names the variable that contains standard error of the linear predictor
SURVIVAL= Names the variable that contains the survival probability
Diagnostic Keywords
DFBETA= Requests the standardized deletion parameter differences
LD= Names the variable that contains the likelihood displacement diagnostic
LMAX= Names the variable that contains the relative influence diagnostic
RESDEV= Names the variable that contains the deviance residuals
RESMART= Names the variable that contains the martingale residuals
RESSCH= Requests the Schoenfeld residuals
RESSCO= Requests the score residuals
WTRESSCH= Requests the weighted Schoenfeld residuals


OUT=SAS-data-set

names the output data set. If you omit the OUT= option, the OUTPUT data set is created and given a default name by using the DATAn convention. For more information, see the section OUT= Output Data Set in the OUTPUT Statement.

METHOD=method

specifies the method used to compute the survivor function estimates. For more information, see the section Survivor Function Estimators. This option appears in the OUTPUT statement after a slash (/). You can specify the following methods:

BRESLOW
CH
EMP

computes the cumulative hazard function estimate of the survivor function; that is, the survivor function is estimated by exponentiating the negative cumulative hazard function.

FH

computes the Fleming-Harrington (FH) estimates of the survivor function. The FH estimator is a tie-breaking modification of the Breslow estimator. If there are no tied event times, this estimator is the same as the Breslow estimator.

PL

computes the product-limit estimates of the survivor function. This estimator is not available if you use the model syntax that allows two time variables for the counting process style of input; in such a case, the Breslow estimator (METHOD=BRESLOW) is used instead.

By default, METHOD=BRESLOW.

The following list describes the statistic and diagnostic keywords:

ATRISK=name

names the variable that contains the number of subjects at risk at the observed time (or at the right endpoint of the at-risk interval when a counting-process specification is used in the MODEL statement, as described in the section Counting Process Style of Input).

CIF=name

names the variable that contains the cumulative incidence probabilities at the observed times. For more information, see the section Cumulative Incidence Prediction.

DFBETA=_ALL_ | name-list

requests the approximate changes in the parameter estimates left-parenthesis ModifyingAbove bold-italic beta With caret minus ModifyingAbove bold-italic beta With caret Subscript left-parenthesis j right-parenthesis Baseline right-parenthesis when the jth observation is omitted. These variables are a weighted transform of the score residual variables and are useful in assessing local influence and in computing robust variance estimates. You can specify this option in one of the following ways:

name-list

specifies up to s variable names, where s is the number of regression parameters of the model that is specified in the MODEL statement. The first variable contains the changes in the first regression parameter, the second variable contains the changes for the second regression parameter, and so on.

_ALL_

requests the changes for all parameters and names them DFBETA_xxx, where xxx is the name of the model regression parameter that is formed from the input variable names (concatenated with the appropriate categories if a classification variable is used). For example, suppose that the model contains a continuous variable X and a CLASS variable Gender with two levels ("Female" and "Male") and that Gender has a GLM parameterization. Three statistics are produced: DFBETA_X, DFBETA_GenderFemale, and DFBETA_GenderMale.

If an effect that is specified in the MODEL statement is not included in the final model, the corresponding statistics are set to missing. For more information, see the section Diagnostics Based on Weighted Residuals.

LD=name

names the variable that contains the approximate likelihood displacement when the observation is left out. This diagnostic can be used to assess the impact of each observation on the overall fit of the model. For more information, see the section Influence of Observations on Overall Fit of the Model.

LMAX=name

names the variable that contains the relative influence of observations on the overall fit of the model. This diagnostic is useful in assessing the sensitivity of the model’s fit to each observation. For more information, see the section Influence of Observations on Overall Fit of the Model.

LOGLOGS=name

names the variable that contains the log of the negative log of the variable named in the SURVIVAL= option.

LOGSURV=name

names the variable that contains the log of variable named in the SURVIVAL= option.

RESDEV=name

names the variable that contains the deviance residuals. This variable is a transform of the variable named in the RESMART= option and can achieve a more symmetric distribution. For more information, see the section Residuals.

RESMART=name

names the variable that contains the martingale residuals. The martingale residual at the observed time t can be interpreted as the difference over left-bracket 0 comma t right-bracket in the observed number of events minus the expected number of events. For more information, see the section Residuals.

RESSCH=_ALL_ | name-list

requests Schoenfeld residuals, which are useful in assessing the proportional hazards assumption. Schoenfeld residuals are computed only at uncensored times and are missing for censored times. You can specify this option in one of the following ways:

name-list

specifies up to s variable names, where s is the number of regression parameters of the model that is specified in the MODEL statement. The first variable contains the Schoenfeld residuals for the first regression parameter, the second variable contains the Schoenfeld residuals for the second regression parameter, and so on.

_ALL_

requests Schoenfeld residuals for all regression parameters and names them RESSCH_xxx, where xxx is the name of the model regression parameter that is formed from the input variable names (concatenated with the appropriate categories if a classification variable is used). For example, suppose that the model contains a continuous variable X and a CLASS variable Gender with two levels ("Female" and "Male") and that Gender has a GLM parameterization. Three statistics are produced: RESSCH_X, RESSCH_GenderFemale, and RESSCH_GenderMale.

If an effect in the MODEL statement is not included in the final model, the corresponding Schoenfeld residuals are set to missing. For more information, see the section Residuals.

RESSCO=_ALL_ | name-list

requests the score residuals, which are a decomposition of the first partial derivative of the log likelihood. These residuals can be used to assess the leverage that is exerted by each subject in the parameter estimates. They are also useful in constructing robust sandwich variance estimates. You can specify this option in one of the following ways:

name-list

specifies up to s variable names, where s is the number of regression parameters of the model that is specified in the MODEL statement. The first variable contains the score residuals for the first regression parameter, the second variable contains the score residuals for the second parameter, and so on.

_ALL_

requests score residuals for all regression parameters and names them RESSCO_xxx, where xxx is the name of the model regression parameter that is formed from the input variable names (concatenated with the appropriate categories if a classification variable is used). For example, suppose that the model contains a continuous variable X and a CLASS variable Gender with two levels ("Female" and "Male") and that Gender has a GLM parameterization. Three statistics are produced: RESSCO_X, RESSCO_GenderFemale, and RESSCO_GenderMale.

If an effect in the MODEL statement is not included in the final model, the corresponding score residuals are set to missing. For more information, see the section Residuals.

STDXBETA=name

names the variable that contains the standard error estimates of linear predictor that is specified in the XBETA= option.

SURVIVAL=name

names the variable that contains the predicted survival probabilities at the observed times. For more information, see the section Survivor Function Estimators.

WTRESSCH=_ALL_ | name-list

requests the weighted Schoenfeld residuals, which are useful in investigating the nature of nonproportionality if the proportional hazard assumption does not hold. You can specify this option in one of the following ways:

name-list

specifies up to s variable names, where s is the number of regression parameters of the model that is specified in the MODEL statement. The first variable contains the weighted Schoenfeld residuals for the first regression parameter, the second variable contains the weighted Schoenfeld residuals for the second regression parameter, and so on.

_ALL_

requests weighted Schoenfeld residuals for all regression parameters and names them WTRESSCH_xxx, where xxx is the name of the model regression parameter that is formed from the input variable names (concatenated with the appropriate categories if a classification variable is used). For example, suppose that the model contains a continuous variable X and a CLASS variable Gender with two levels ("Female" and "Male") and that Gender has a GLM parameterization. Three statistics are produced: WTRESSCH_X, WTRESSCH_GenderFemale, and WTRESSCH_GenderMale.

If an effect in the MODEL statement is not included in the final model, the corresponding weighted Schoenfeld residuals are set to missing. For more information, see the section Diagnostics Based on Weighted Residuals.

XBETA=name

names the variable that contains the estimates of the linear predictor.

Last updated: December 09, 2022