The bootstrap is based on the plug-in principle and is an extension of the practice of replacing unknown parameters with estimates (for example, substituting a sample mean for a population mean). The extension goes all the way to the entire population F from which the data being analyzed are a sample.
The most popular variety of bootstrap is the nonparametric bootstrap, which relies on random sampling with replacement from the data to estimate the distribution of a sample estimate (or the joint distribution of multiple sample estimates).
The bootstrap methods in PROC TTEST are all based on the nonparametric bootstrap. The other two main varieties are the parametric bootstrap (sampling from a model that has estimated parameters) and smoothed bootstrap (sampling from a continuous distribution estimate).
The heuristic for the nonparametric bootstrap is as follows:
The statistics of primary interest in PROC TTEST are the sample mean and sample standard deviation.
The main purpose of bootstrapping is to assess the accuracy and precision of one or more sample estimates in terms of bias, standard error, and confidence intervals.
In typical situations, the bootstrap is not useful for estimating a population parameter or the CDF or quantiles of sample estimates. This is because the bootstrap distribution is centered around the observed statistic, not the population parameter. For example, the bootstrap cannot improve on a sample mean estimate.
Bootstrapping can also be a useful tool for inference in various situations, such as the following:
Parametric assumptions are violated. For example, intervals for a variance and F-based intervals for the ratio of variances are not robust to deviations from normality, and their coverage does not improve even with increasing sample size.
It is too difficult to derive formulas.
The data are stored in a way that makes calculating formulas impractical.
Popular and useful applications of bootstrapping include the following:
Better standard error estimates
Bias estimates
Percentile intervals, optionally with corrections for median bias or narrowness bias (or both)
t-based intervals, which are traditional t-based confidence intervals either with the bootstrap standard error in place of the traditional standard error or with bootstrap quantiles of the t statistic in place of t distribution quantiles
The two most notable shortcomings of the bootstrap are as follows:
It tends to perform poorly for small samples.
Bootstrap bias-corrected estimates are usually worse than estimates that are based on the original sample. Even though they tend to be more accurate, they also tend to have much higher variance.
Hesterberg (2015) points out several educational benefits of the bootstrap:
Because the bootstrap works the same way with a wide variety of statistics, students can focus on ideas rather than formulas. They can also focus on statistics that are appropriate rather than "well-behaved."
Plots of the bootstrap distribution can help make the abstract concrete for concepts such as sampling distributions, standard errors, bias, the central limit theorem, and confidence intervals.
The action of drawing bootstrap samples reinforces the role that random sampling plays in statistics.
The relationship between the bootstrap distribution and the original sample is fundamentally the same as the relationship between the original sample and the population. Patterns that are observed in bootstrap samples (for example, excessive narrowness) usually imply similar patterns in random sampling from the population (for example, the same narrowness that is corrected for with the factor in the traditional sample standard deviation estimate).
Politis (2016) explains how bootstrapping can help ease students into understanding the notion of resampling from an empirical distribution, instilling confidence in mean and variance estimates without relying on the (often unjustifiable) assumption of normality. He suggests a three-stage approach for guiding students through this transition:
Introduce Monte Carlo simulation as an alternative to distribution theory.
Demonstrate the parametric bootstrap as an alternative to critical value tables.
Abandon the parametric paradigm altogether by generating quantiles and percentile intervals from the resampling distribution. Show when the bootstrap works better or worse than the parametric approach.
The TTEST procedure does not support the use of the WEIGHT statement with the bootstrap because there is no consensus on weighted bootstrap methods.
The FREQ statement is supported with the bootstrap.
Most notation and formulas involved in the descriptions of bootstrap methods in subsequent sections have already been discussed in previous sections, but they are presented here for easier reference. Estimates that involve the empirical distribution derived from the data are newly presented in this section and are denoted as "," sometimes with a subscript to distinguish among alternative assumptions.
Table 8 summarizes the basic notation for each design that is supported in bootstrap methods in PROC TTEST.
Table 8: Common Notation
| Symbol | Description |
|---|---|
| One-Sample Design | |
| n | Number of observations |
|
|
Population mean |
|
|
Population variance |
|
|
Value of ALPHA= option in PROC TTEST statement, such that the confidence level for all bootstrap confidence intervals is 100(1 – |
|
|
Value of ith observation, |
| Two-Sample Design | |
|
|
Number of observations at the first class level |
|
|
Number of observations at the second class level |
|
|
Value of ith observation at the first class level, |
|
|
Value of ith observation at the second class level, |
| General | |
|
|
100p percentile of standard normal distribution |
|
|
100p percentile of t distribution with |
The standard error of is the standard deviation of its sampling distribution. The degrees of freedom discussed in this section reflect the values that would be used for t tests for the corresponding designs in PROC TTEST.
One-sample estimates for mean (), standard deviation (s), standard error of mean (SE), and degrees of freedom (df) are as follows:
Two-sample estimates for within-group means ( and
), mean difference (
), and within-group standard deviations (
and
) are as follows:
Two-sample pooled estimates for standard deviation that is assumed to be common within groups (), standard error of mean difference (
), and degrees of freedom (
) are as follows:
Note that ,
,
, and
are all unbiased estimators of their respective variances.
The two-sample unpooled standard error estimate of the mean difference () and degrees of freedom estimate for unpooled (Satterthwaite) t statistic (
) are as follows:
The one-sample variance of the empirical distribution () and the standard error of the empirical distribution of the mean (ZE) are as follows:
The two-sample pooled variance of the empirical distribution () and the pooled standard error estimate of the empirical distribution of mean difference (
) are as follows:
The two-sample unpooled standard error of the empirical distribution of mean difference is defined as
For the nonparametric bootstrap for a one-sample design, a bootstrap sample is a random draw of n observations with replacement from the original data set, where is the statistic that is calculated from a sample of n iid observations (for example,
or s), r is the number of independent bootstrap samples, and
is the value of
for the ith bootstrap sample from the original data, where
.
The bootstrap for a paired design is identical to the bootstrap for a one-sample design if is defined as the difference between the first and second members of the ith pair.
In a bootstrap for a two-sample design, random draws of size and
are taken with replacement from the first and second groups, respectively, and combined to produce a single bootstrap sample.
The sample estimates for statistics
that are supported in bootstrap analyses are computed as follows.
For a one-sample design, the mean is estimated by
and the standard deviation is estimated by
For a paired design, the mean of the paired difference is estimated by
and the standard deviation of the paired difference is estimated by
For a two-sample design, the mean of the class difference
is estimated by
Under the assumption of equal variances (), the pooled estimate of the standard deviation of the class difference is
Under the assumption of unequal variances, the Satterthwaite estimate of the standard deviation of the class difference is
The bootstrap standard error is the sample standard deviation of the bootstrap distribution:
The bootstrap bias estimate is
Several confidence intervals in the next section are based on quantiles of bootstrap samples. Following the convention in Efron and Tibshirani (1993, section 12.5), the quantile for an ambiguous case is chosen as the nearest sample value in the direction toward the center of the bootstrap distribution. This choice ensures that confidence intervals that are constructed from the quantiles satisfy the desired coverage. In particular, the pth quantile (100p percentile) of the bootstrap distribution of
(or some function of
) is computed as follows:
The bootstrap confidence intervals that PROC TTEST implements are based primarily on recommendations from Hesterberg (2015). The recommendations are based on a combination of educational value and good performance in practice.
See Table 5 for a summary of which parameters are supported for each type of confidence interval.
For the following sections, let denote the estimate of the standard error based on unbiased variance estimates—that is,
for a one-sample or paired design,
for a pooled analysis for a two-sample design, or
for an unpooled analysis for a two-sample design. Similarly, let
denote the estimate of the standard error based on the variances of the empirical distribution—that is,
,
, or
. Finally, let
denote the degrees of freedom that would be used for t tests for the corresponding designs—that is,
,
, or
.
Perhaps the most crude confidence interval based on the bootstrap is the normal interval with bootstrap standard error, which is simply the normal-based confidence interval with the usual standard error replaced by the bootstrap standard error:
In PROC TTEST, the normal interval with bootstrap standard error is computed only for the mean or mean difference. Standard confidence intervals for standard deviations are based on the chi-square distribution rather than on the normal distribution and thus do not have a bootstrap analog of this type.
The bootstrap percentile interval is recommended by Hesterberg (2015) as one of two "quick and dirty" intervals to begin with when introducing students to the bootstrap. Depending on the sidedness, it is the middle, lower, or upper 100(1 – )% of the bootstrap distribution:
This interval is usually the most intuitive one for students. It is robust to skewness in the data, but it performs poorly for small sample sizes. It tends to be too narrow, and it is only "first-order accurate." For a one-sample design, first-order accuracy means that the one-sided coverage probability differs from the nominal value by .
The other "quick and dirty interval" is the t interval with bootstrap standard error, which is the traditional t-based confidence interval with the usual standard error replaced by the bootstrap standard error:
This interval is also the same as the normal interval with bootstrap standard error where normal quantiles are replaced by t quantiles.
In PROC TTEST, the t interval with bootstrap standard error is computed only for the mean or mean difference. Standard confidence intervals for standard deviations are based on the chi-square distribution rather than on the t distribution and thus do not have a bootstrap analog of this type.
The t interval with bootstrap standard error can help students learn formula methods. It performs relatively well for small n but is not robust to skewness in the data.
Students can compare percentile and t intervals: if they are similar, then they are both probably acceptable.
Whereas the usual bootstrap percentile interval has coverage properties similar to the normal interval with bootstrap standard error (robustness to skewness notwithstanding), the expanded bootstrap percentile interval alleviates the narrowness bias by "upgrading" the coverage properties to be more like the t interval with bootstrap standard error. The expanded bootstrap percentile is produced by replacing the in the bootstrap percentile interval with the value
that solves the equation
=
, where
is the half-width of the normal-based
confidence interval that uses the variance of the empirical distribution and
is the half-width of the t-based
confidence interval that uses the unbiased variance estimate. The half-width of a two-sided interval is the length of the interval divided by two, and the half-width of a one-sided interval is the absolute difference between the point estimate and the finite limit.
where d is the number of sides.
The solutions for different designs are as follows:
The resulting expanded percentile interval for each case is
In PROC TTEST, the bootstrap expanded percentile interval is computed only for the mean or mean difference. Standard confidence intervals for standard deviations are based on the chi-square distribution rather than on the normal or t distributions and thus do not have a bootstrap analog of this type.
The expanded interval is better than the bootstrap percentile interval and the t interval with bootstrap standard error but not as good as the bootstrap t interval, which is described in the following section.
The bootstrap t interval eschews the assumption of the t statistic having a t distribution and instead uses quantiles of its bootstrap distribution, along with traditional standard error estimates,
where q are quantiles of and
is the (non-bootstrap) standard error estimate of
based on unbiased variance estimates.
In PROC TTEST, the bootstrap t interval is computed only for the mean or mean difference. There is no reasonable general formula for the standard error of the sample standard deviation.
The bootstrap t interval allows for asymmetry and is "second-order accurate," satisfying the following properties for a one-sample design:
The bootstrap bias-corrected percentile interval (BC) is
where
The BC interval is the default bootstrap confidence interval in PROC TTEST (and also in the NLIN and CAUSALTRT procedures). It corrects for median bias, which occurs when the median of the sampling distribution differs from . The two-sided version is given in Efron and Tibshirani (1993, equation 14.10), and the one-sided version is given in Carpenter and Bithell (2000, equation 9).