The SURVEYPHREG Procedure

PROC SURVEYPHREG Statement

PROC SURVEYPHREG <options>;

The PROC SURVEYPHREG statement invokes the SURVEYPHREG procedure. It also identifies the data set to be analyzed. Table 1 summarizes the options available in the PROC SURVEYPHREG statement.

Table 1: PROC SURVEYPHREG Statement Options

Option	Description
ATRISK	Displays a table that contains the sum of weights for the number of units and the sum of weights for the corresponding number of events in the risk sets
DATA=	Names the input SAS data set
MISSING	Treats missing values as a valid category
NAMELEN=	Specifies the length of effect names
NOMCAR	Uses missing observations specified as not missing completely at random
NOPRINT	Suppresses all displayed output
ORDER=	Specifies the sort order of CLASS variables
RATE=	Specifies the sampling rate
TOTAL=	Specifies the total number of primary sampling units
VARMETHOD=	Specifies the variance estimation method

You can specify the following options in the PROC SURVEYPHREG statement:

ATRISK

displays a table that contains the sum of weights for the number of units at risk at each distinct event time and the sum of weights for the corresponding number of events in the risk sets. For example, the risk set information in Figure 3 is displayed if the ATRISK option is specified in the example in the section Getting Started: SURVEYPHREG Procedure.

Figure 3: Risk Set Information

The SURVEYPHREG Procedure

Risk Set Sum of Weights
lenBorrow	At Risk	Event
1	11616.79	5440.11
2	6176.68	1177.71
3	4998.97	926.55
4	4072.42	1411.07
5	2661.35	461.89
6	2199.46	565.01
7	1634.45	236.58
8	1397.87	230.3

DATA=SAS-data-set

names the SAS data set that contains the data to be analyzed. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MISSING

treats missing values as a valid (nonmissing) category for all categorical variables, which include CLASS, STRATA, CLUSTER, and DOMAIN variables. By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value for any of these categorical variables. For more information, see the section Missing Values.

NAMELEN=n

specifies the length of effect names in tables and output data sets to be n characters, where n is a value between 20 and 200, inclusive. By default, NAMELEN=20.

NOMCAR

includes observations with missing values of the analysis variables that are specified in the MODEL statement as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYPHREG computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains. See the section Missing Values for details.

By default, PROC SURVEYPHREG excludes an observation from analyses (and the corresponding variance computations) if that observation has a missing value for any of the variables in the MODEL statement. Note that if you specify the MISSING option for classification variables, then the procedure treats the missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation. Other replication methods do not use the NOMCAR option.

NOPRINT

suppresses all displayed output. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 23, Using the Output Delivery System, for more information.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the sort order for the levels of the classification variables (which are specified in the CLASS statement).

This option applies to the levels for all classification variables, except when you use the (default) ORDER=FORMATTED option with numeric classification variables that have no explicit format. In that case, the levels of such variables are ordered by their internal value.

The ORDER= option can take the following values:

Value of ORDER=	Levels Sorted By
DATA	Order of appearance in the input data set
FORMATTED	External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value
FREQ	Descending frequency count; levels with the most observations come first in the order
INTERNAL	Unformatted value

By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine-dependent.

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in the "Grouping Data" section of SAS Programmers Guide: Essentials.

RATE=value | SAS-data-set R=value | SAS-data-set

specifies the sampling rate, which PROC SURVEYPHREG uses to compute a finite population correction for Taylor series or bootstrap variance estimation. This option is ignored for BRR and jackknife variance estimation.

If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) that are selected to the total number of PSUs in the population.

You can specify the sampling rate in either of the following ways:

value: specifies a nonnegative number to use for a nonstratified design or for a stratified design that has the same sampling rate in each stratum.
SAS-data-set: specifies a SAS-data-set that contains the stratification variables and the sampling rates for a stratified design that has different sampling rates in the strata. You must provide the sampling rates in the data set variable named _RATE_.

The sampling rates must be nonnegative numbers. You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYPHREG converts that number to a proportion. The procedure treats the value 1 as 100% instead of 1%.

For more information, see the section Population Totals and Sampling Rates.

If you do not specify the RATE= or TOTAL= option, then the Taylor series or bootstrap variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option in the same PROC SURVEYPHREG statement.

TOTAL=value | SAS-data-set N=value | SAS-data-set

specifies the total number of primary sampling units (PSUs) in the population. PROC SURVEYPHREG uses the value to compute a finite population correction for Taylor series or bootstrap variance estimation. This option is ignored for BRR and jackknife variance estimation.

You can specify the total number of PSUs in either of the following ways:

value: specifies a positive number to use for a nonstratified design or for a stratified design that has the same population total in each stratum.
SAS-data-set: specifies a SAS-data-set that contains the stratification variables and the population totals for a stratified design that has different population totals in the strata. You must provide the stratum totals in the data set variable named _TOTAL_.

The stratum totals must be positive numbers.

For more information, see the section Population Totals and Sampling Rates.

If you do not specify the TOTAL= or RATE= option, then the Taylor series or bootstrap variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option in the same PROC SURVEYPHREG statement.

VARMETHOD=method <(method-options)>

specifies the variance estimation method. PROC SURVEYPHREG provides the Taylor series method and balanced repeated replication (BRR), jackknife, and bootstrap replication (resampling) methods.

Table 2 summarizes the available methods and method-options.

Table 2: Variance Estimation Options

method	Variance Estimation Method	method-options
BOOTSTRAP	Bootstrap	CENTER=FULLSAMPLE \| REPLICATES
		DETAILS
		MH=number \| SAS-data-set
		OUTWEIGHTS=SAS-data-set
		REPS=number
		SEED=number
BRR	Balanced repeated replication	CENTER=FULLSAMPLE \| REPLICATES
		DETAILS
		FAY <=value>
		HADAMARD=SAS-data-set
		OUTWEIGHTS=SAS-data-set
		PRINTH
		REPS=number
JACKKNIFE	Jackknife	CENTER=FULLSAMPLE \| REPLICATES
		DETAILS
		OUTJKCOEFS=SAS-data-set
		OUTWEIGHTS=SAS-data-set
TAYLOR	Taylor series linearization	None

By default, VARMETHOD=JACKKNIFE if you also specify a REPWEIGHTS statement; otherwise, VARMETHOD=TAYLOR by default.

You can specify the following methods:

BOOTSTRAP < (method-options) >

requests variance estimation by the bootstrap method. The bootstrap method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you provide replicate weights by using a REPWEIGHTS statement. For more information, see the section Bootstrap Method.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE: computes the deviations of the replicate estimates from the full sample estimate.
REPLICATES: computes the deviations of the replicate estimates from the average of the replicate estimates.

For more information, see the section Bootstrap Method. By default, CENTER=FULLSAMPLE.

DETAILS

displays the maximum likelihood estimates of model parameters for replicate samples when the replicate parameter estimates are available. A replicate sample might not provide useful parameter estimates (replicate estimates) for reasons such as nonconvergence of the optimization or inestimability of some parameters in that replicate sample.

MH=value | (values) | SAS-data-set

specifies the number of PSUs to select for the bootstrap replicate samples. You can provide bootstrap stratum sample sizes by specifying a list of values or a SAS-data-set. Alternatively, you can provide a single bootstrap sample size value to use for all strata or for a nonstratified design. For more information, see the section Bootstrap Method.

Each bootstrap sample size must be a positive integer and must be less than , which is the total number of PSUs in stratum h. By default, = for a stratified design. For a nonstratified design, the bootstrap sample size value must be less than n (the total number of PSUs in the sample). By default, m = n – 1 for a nonstratified design.

You can provide the bootstrap sample size by specifying one of the following forms:

MH=value

specifies a single bootstrap sample size value to use for all strata or for a nonstratified design.

MH=(values)

specifies a list of stratum bootstrap sample size values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. The number of values must not be less than the number of strata in the DATA= input data set.

The order of the stratum sample size values must match the order of the stratum levels in the DATA= input data set. Each stratum sample size value must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

MH=SAS-data-set

names a SAS-data-set that contains the stratum bootstrap sample sizes. You must provide the sample sizes in a data set variable named _NSIZE_ or SampleSize.

The SAS-data-set must contain all stratification variables that you specify in the STRATA statement. It must also contain all stratum levels that appear in the DATA= input data set. The order of the stratum levels in the SAS-data-set must match the order of the levels in the DATA= data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets.

Each value of the _NSIZE_ or SampleSize variable must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the replicate weights that PROC SURVEYPHREG creates for bootstrap variance estimation. For information about replicate weights, see the section Bootstrap Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

REPS=number

specifies the number of replicates for bootstrap variance estimation, where number must be an integer greater than 1. Increasing the number of replicates improves the estimation precision but also increases the computation time. By default, REPS=250.

SEED=number

specifies the initial seed for random number generation, where number must be a positive integer.

If you do not specify this option or if you specify a number that is negative or 0, PROC SURVEYPHREG uses the time of day from the computer’s clock to obtain an initial seed.

The seed that is used is displayed in the "Variance Estimation" table.

To reproduce the same bootstrap replicate weights and the same analysis in a subsequent execution of PROC SURVEYPHREG, you can specify the same initial seed that was used in the original analysis.

BRR < (method-options) >

requests variance estimation by balanced repeated replication (BRR). The BRR method requires a stratified sample design with two primary sampling units (PSUs) in each stratum. If you specify the VARMETHOD=BRR option, you must also specify a STRATA statement unless you provide replicate weights with a REPWEIGHTS statement. See the section Balanced Repeated Replication (BRR) Method for details.

You can specify the following method-options in parentheses after the VARMETHOD=BRR option:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the BRR method. CENTER=FULLSAMPLE is the default, which computes the deviations of the replicate estimates from the full sample estimate. Alternatively, you can specify CENTER=REPLICATES to compute the deviations of the replicate estimates from the average of the replicate estimates. See the section Balanced Repeated Replication (BRR) Method for details.

DETAILS

displays the maximum likelihood estimates of model parameters for replicate samples when the replicate parameter estimates are available. A replicate sample might not provide useful parameter estimates (replicate estimates), for reasons such as nonconvergence of the optimization or inestimability of some parameters in that replicate sample.

FAY <=value>

requests Fay’s method, which is a modification of the BRR method. See the section Fay’s BRR Method for details.

You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the value of the Fay coefficient equals 0.5.

HADAMARD=SAS-data-set H=SAS-data-set

names a SAS data set that contains the Hadamard matrix for BRR replicate construction. If you do not provide a Hadamard matrix with the HADAMARD= method-option, PROC SURVEYPHREG generates an appropriate Hadamard matrix for replicate construction. See the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix for details.

If a Hadamard matrix of a given dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS data set in the HADAMARD= method-option.

In the HADAMARD= input data set, each variable corresponds to a column of the Hadamard matrix, and each observation corresponds to a row of the matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, , where is the Hadamard matrix of dimension R and is an identity matrix. PROC SURVEYPHREG does not check the validity of the Hadamard matrix that you provide.

The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYPHREG uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations.

If you do not specify the REPS= method-option, then the number of replicates is equal to the number of observations in the HADAMARD= input data set. If you specify the number of replicates—for example, REPS=nreps—then the first nreps observations in the HADAMARD= data set are used to construct the replicates.

You can specify the PRINTH method-option to display the Hadamard matrix that PROC SURVEYPHREG uses to construct replicates for BRR variance estimation.

OUTWEIGHTS=SAS-data-set

names an output SAS data set to store the replicate weights that PROC SURVEYPHREG creates for BRR variance estimation. For more information about replicate weights, see the section Balanced Repeated Replication (BRR) Method. For more information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

The OUTWEIGHTS= method-option is not available when you provide replicate weights by using a REPWEIGHTS statement.

PRINTH

displays the Hadamard matrix that is used to construct replicates for BRR variance estimation. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYPHREG displays only the rows and columns that are actually used to construct replicates. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

The PRINTH method-option is not available when you provide replicate weights by using a REPWEIGHTS statement, because PROC SURVEYPHREG does not use a Hadamard matrix in this case.

REPS=number

specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.

If you do not provide a Hadamard matrix by using the HADAMARD= method-option, the number of replicates should be greater than the number of strata and should be a multiple of 4. For more information, see the section Balanced Repeated Replication (BRR) Method. If a Hadamard matrix cannot be constructed for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, it is possible for the actual number of replicates to be larger than the REPS= value that you specify.

If you provide a Hadamard matrix by using the HADAMARD= method-option, the value of REPS= must not be greater than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates equals the number of rows in the Hadamard matrix.

If you do not specify the REPS= or HADAMARD= method-option and do not include a REPWEIGHTS statement, the number of replicates equals the smallest multiple of 4 that is greater than the number of strata.

If you provide replicate weights with a REPWEIGHTS statement, the procedure does not use the REPS= method-option. With a REPWEIGHTS statement, the number of replicates equals the number of REPWEIGHTS variables.

JACKKNIFE | JK <(method-options)>

requests variance estimation by the delete-1 jackknife method. See the section Jackknife Method for details. If you provide replicate weights with a REPWEIGHTS statement, VARMETHOD=JACKKNIFE is the default variance estimation method. The JACKKNIFE method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you provide replicate weights with a REPWEIGHTS statement.

You can specify the following method-options in parentheses following VARMETHOD=JACKKNIFE:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the jackknife method. CENTER=FULLSAMPLE is the default, which computes the deviations of the replicate estimates from the full sample estimate. Alternatively, you can specify CENTER=REPLICATES to compute the deviations of the replicate estimates from the average of the replicate estimates. See the section Jackknife Method for details.

DETAILS

displays the maximum likelihood estimates of model parameters for replicate samples when the replicate parameter estimates are available. A replicate sample might not provide useful parameter estimates (replicate estimates), for reasons such as nonconvergence of the optimization or inestimability of some parameters in that replicate sample.

OUTJKCOEFS=SAS-data-set

names an output SAS data set that contains jackknife coefficients. See the section Jackknife Coefficients Output Data Set for more details about the contents of the OUTJKCOEFS= data set.

OUTWEIGHTS=SAS-data-set

names an output SAS data set that contains replicate weights. See the section Jackknife Method for more information about replicate weights. See the section Replicate Weights Output Data Set for more details about the contents of the OUTWEIGHTS= data set.

The OUTWEIGHTS= method-option is not available when you provide replicate weights with a REPWEIGHTS statement.

TAYLOR

requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. See the section Taylor Series Linearization for more information.

Last updated: December 09, 2022