The SURVEYPHREG Procedure

Bootstrap Method

The naive bootstrap variance estimator that is suitable for infinite population is not consistent when applied to complex surveys. Bootstrap replicate samples for complex surveys are created by using a simple random sample with replacement of primary sampling units (PSUs) within each stratum. PSUs in different strata are sampled independently. The original sampling weights are then adjusted in each replicate to reflect the full sample. These adjusted weights are also called bootstrap replicate weights. McCarthy and Snowden (1985), Rao and Wu (1988), Sitter (1992b), and Sitter (1992a) provide several adjusted bootstrap variance estimators that are consistent for complex surveys. For more information about bootstrap variance estimation for complex surveys, see Mashreghi, Haziza, and Léger (2016), Beaumont and Patak (2012), Lohr (2010, Section 9.3.3), Fuller (2009, Section 4.5), Wolter (2007, Chapter 5), and Shao and Tu (1995, Section 6.2.4).

If you do not provide replicate weights by using the REPWEIGHTS statement, then the BOOTSTRAP option in the PROC SURVEYPHREG statement creates bootstrap replicate weights for you. This bootstrap method is similar to the method of Rao, Wu, and Yue (1992) and is also known as bootstrap weights method (Mashreghi, Haziza, and Léger 2016).

Each replicate is obtained by selecting a simple random sample with replacement of m Subscript h PSUs from stratum h. The rth bootstrap replicate weight for observation unit j in PSU i and stratum h is given by

w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline equals w Subscript h i j Baseline StartSet 1 minus StartRoot left-parenthesis 1 minus f Subscript h Baseline right-parenthesis m Subscript h Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis EndRoot plus StartRoot left-parenthesis 1 minus f Subscript h Baseline right-parenthesis m Subscript h Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis EndRoot left-parenthesis n Subscript h Baseline slash m Subscript h Baseline right-parenthesis k Subscript h i Superscript left-parenthesis r right-parenthesis Baseline EndSet

where k Subscript h i Superscript left-parenthesis r right-parenthesis is the number of times PSU i is selected in replicate sample r, and f Subscript h is the sampling fraction in stratum h.

Let ModifyingAbove bold-italic beta With caret be the estimated proportional hazards regression coefficients from the full sample, and let ModifyingAbove bold-italic beta With caret Subscript r be the estimated proportional hazards regression coefficients from the rth replicate by using replicate weights. PROC SURVEYPHREG estimates the covariance matrix of ModifyingAbove bold-italic beta With caret by

ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals StartFraction 1 Over upper R EndFraction sigma-summation Underscript r equals 1 Overscript upper R Endscripts left-parenthesis ModifyingAbove bold-italic beta With caret Subscript r Baseline minus ModifyingAbove bold-italic beta With caret right-parenthesis left-parenthesis ModifyingAbove bold-italic beta With caret Subscript r Baseline minus ModifyingAbove bold-italic beta With caret right-parenthesis prime

with sigma-summation Underscript h equals 1 Overscript upper H Endscripts n Subscript h minus upper H degrees of freedom, where n Subscript h is the number of PSUs in stratum h, and H is the number of strata.

If you specify the CENTER=REPLICATES method-option, then PROC SURVEYPHREG computes the covariance matrix of ModifyingAbove bold-italic beta With caret by

ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals StartFraction 1 Over upper R EndFraction sigma-summation Underscript r equals 1 Overscript upper R Endscripts left-parenthesis ModifyingAbove bold-italic beta With caret Subscript r Baseline minus ModifyingAbove bold-italic beta With caret Subscript r Baseline overbar right-parenthesis left-parenthesis ModifyingAbove bold-italic beta With caret Subscript r Baseline minus ModifyingAbove bold-italic beta With caret Subscript r Baseline overbar right-parenthesis prime

where ModifyingAbove bold-italic beta With caret Subscript r Baseline overbar is the average of the replicate estimates as follows:

ModifyingAbove bold-italic beta With caret Subscript r Baseline overbar equals StartFraction 1 Over upper R EndFraction sigma-summation Underscript r equals 1 Overscript upper R Endscripts ModifyingAbove bold-italic beta Subscript r Baseline With caret

If one or more components of ModifyingAbove bold-italic beta With caret Subscript r cannot be calculated for some replicates, then the variance estimate is computed by using only the replicates for which the proportional hazards regression coefficients can be estimated. Estimability and nonconvergence are the two most common reasons why ModifyingAbove bold-italic beta With caret Subscript r might not be available for a replicate sample even if ModifyingAbove bold-italic beta With caret is defined for the full sample. Let upper R Subscript a be the number of replicates where ModifyingAbove bold-italic beta With caret Subscript r is available, and let upper R minus upper R Subscript a be the number of replicates where ModifyingAbove bold-italic beta With caret Subscript r is not available. Without loss of generality, assume that ModifyingAbove bold-italic beta With caret Subscript r is available only for the first upper R Subscript a replicates; then the bootstrap variance estimator is

ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals StartFraction 1 Over upper R Subscript a Baseline EndFraction sigma-summation Underscript r equals 1 Overscript upper R Subscript a Baseline Endscripts left-parenthesis ModifyingAbove bold-italic beta With caret Subscript r Baseline minus ModifyingAbove bold-italic beta With caret right-parenthesis left-parenthesis ModifyingAbove bold-italic beta With caret Subscript r Baseline minus ModifyingAbove bold-italic beta With caret right-parenthesis prime

with degrees of freedom equal to the minimum of sigma-summation Underscript h equals 1 Overscript upper H Endscripts n Subscript h minus upper H and upper R Subscript a, where n Subscript h is the number of PSUs in stratum h, and H is the number of strata.

Although PROC SURVEYPHREG creates bootstrap weights only from the bootstrap weights method (Rao, Wu, and Yue 1992), bootstrap weights that are generated from any bootstrap methods can be used in the REPWEIGHTS statement. If the bootstrap replicate weights are available to you for a survey, then you can use the REPWEIGHTS statement to name the variables that contain the bootstrap replicate weights and specify the VARMETHOD=BOOTSTRAP option in the PROC SURVEYPHREG statement. The SURVEYPHREG procedure uses 1 slash upper R as the default bootstrap replicate coefficient when you specify the VARMETHOD=BOOTSTRAP option, where R is the total number of replicates. Alternatively, you can specify different replicate coefficients by using the REPCOEFS= option in the REPWEIGHTS statement.

For more information, see the section Replicate Weights Method.

Last updated: December 09, 2022