The naive bootstrap variance estimator that is suitable for infinite population is not consistent when applied to complex surveys. Bootstrap replicate samples for complex surveys are created by using a simple random sample with replacement of primary sampling units (PSUs) within each stratum. PSUs in different strata are sampled independently. The original sampling weights are then adjusted in each replicate to reflect the full sample. These adjusted weights are also called bootstrap replicate weights. McCarthy and Snowden (1985), Rao and Wu (1988), Sitter (1992b), and Sitter (1992a) provide several adjusted bootstrap variance estimators that are consistent for complex surveys. For more information about bootstrap variance estimation for complex surveys, see Mashreghi, Haziza, and Léger (2016), Beaumont and Patak (2012), Lohr (2010, Section 9.3.3), Fuller (2009, Section 4.5), Wolter (2007, Chapter 5), and Shao and Tu (1995, Section 6.2.4).
If you do not provide replicate weights by using the REPWEIGHTS statement, then the BOOTSTRAP option in the PROC SURVEYPHREG statement creates bootstrap replicate weights for you. This bootstrap method is similar to the method of Rao, Wu, and Yue (1992) and is also known as bootstrap weights method (Mashreghi, Haziza, and Léger 2016).
Each replicate is obtained by selecting a simple random sample with replacement of PSUs from stratum h. The rth bootstrap replicate weight for observation unit j in PSU i and stratum h is given by
where is the number of times PSU i is selected in replicate sample r, and
is the sampling fraction in stratum h.
Let be the estimated proportional hazards regression coefficients from the full sample, and let
be the estimated proportional hazards regression coefficients from the rth replicate by using replicate weights. PROC SURVEYPHREG estimates the covariance matrix of
by
with degrees of freedom, where
is the number of PSUs in stratum h, and H is the number of strata.
If you specify the CENTER=REPLICATES method-option, then PROC SURVEYPHREG computes the covariance matrix of by
where is the average of the replicate estimates as follows:
If one or more components of cannot be calculated for some replicates, then the variance estimate is computed by using only the replicates for which the proportional hazards regression coefficients can be estimated. Estimability and nonconvergence are the two most common reasons why
might not be available for a replicate sample even if
is defined for the full sample. Let
be the number of replicates where
is available, and let
be the number of replicates where
is not available. Without loss of generality, assume that
is available only for the first
replicates; then the bootstrap variance estimator is
with degrees of freedom equal to the minimum of and
, where
is the number of PSUs in stratum h, and H is the number of strata.
Although PROC SURVEYPHREG creates bootstrap weights only from the bootstrap weights method (Rao, Wu, and Yue 1992), bootstrap weights that are generated from any bootstrap methods can be used in the REPWEIGHTS statement. If the bootstrap replicate weights are available to you for a survey, then you can use the REPWEIGHTS statement to name the variables that contain the bootstrap replicate weights and specify the VARMETHOD=BOOTSTRAP option in the PROC SURVEYPHREG statement. The SURVEYPHREG procedure uses as the default bootstrap replicate coefficient when you specify the VARMETHOD=BOOTSTRAP option, where R is the total number of replicates. Alternatively, you can specify different replicate coefficients by using the REPCOEFS= option in the REPWEIGHTS statement.
For more information, see the section Replicate Weights Method.