The TTEST Procedure

Bootstrap Methods

Overview of the Bootstrap

The bootstrap is based on the plug-in principle and is an extension of the practice of replacing unknown parameters with estimates (for example, substituting a sample mean for a population mean). The extension goes all the way to the entire population F from which the data being analyzed are a sample.

The most popular variety of bootstrap is the nonparametric bootstrap, which relies on random sampling with replacement from the data to estimate the distribution of a sample estimate (or the joint distribution of multiple sample estimates).

The bootstrap methods in PROC TTEST are all based on the nonparametric bootstrap. The other two main varieties are the parametric bootstrap (sampling from a model that has estimated parameters) and smoothed bootstrap (sampling from a continuous distribution estimate).

The heuristic for the nonparametric bootstrap is as follows:

Draw n observations with replacement from the original n data points to create a "bootstrap sample."
Calculate a statistic of interest, , from the bootstrap sample and denote its computed value as .
Repeat for a total of r samples.

The statistics of primary interest in PROC TTEST are the sample mean and sample standard deviation.

Purpose of the Bootstrap

The main purpose of bootstrapping is to assess the accuracy and precision of one or more sample estimates in terms of bias, standard error, and confidence intervals.

In typical situations, the bootstrap is not useful for estimating a population parameter or the CDF or quantiles of sample estimates. This is because the bootstrap distribution is centered around the observed statistic, not the population parameter. For example, the bootstrap cannot improve on a sample mean estimate.

Bootstrapping can also be a useful tool for inference in various situations, such as the following:

Parametric assumptions are violated. For example, intervals for a variance and F-based intervals for the ratio of variances are not robust to deviations from normality, and their coverage does not improve even with increasing sample size.
It is too difficult to derive formulas.
The data are stored in a way that makes calculating formulas impractical.

Useful Applications and Notable Shortcomings

Popular and useful applications of bootstrapping include the following:

Better standard error estimates
Bias estimates
Percentile intervals, optionally with corrections for median bias or narrowness bias (or both)
t-based intervals, which are traditional t-based confidence intervals either with the bootstrap standard error in place of the traditional standard error or with bootstrap quantiles of the t statistic in place of t distribution quantiles

The two most notable shortcomings of the bootstrap are as follows:

It tends to perform poorly for small samples.
Bootstrap bias-corrected estimates are usually worse than estimates that are based on the original sample. Even though they tend to be more accurate, they also tend to have much higher variance.

Educational Value

Hesterberg (2015) points out several educational benefits of the bootstrap:

Because the bootstrap works the same way with a wide variety of statistics, students can focus on ideas rather than formulas. They can also focus on statistics that are appropriate rather than "well-behaved."
Plots of the bootstrap distribution can help make the abstract concrete for concepts such as sampling distributions, standard errors, bias, the central limit theorem, and confidence intervals.
The action of drawing bootstrap samples reinforces the role that random sampling plays in statistics.
The relationship between the bootstrap distribution and the original sample is fundamentally the same as the relationship between the original sample and the population. Patterns that are observed in bootstrap samples (for example, excessive narrowness) usually imply similar patterns in random sampling from the population (for example, the same narrowness that is corrected for with the factor in the traditional sample standard deviation estimate).

Politis (2016) explains how bootstrapping can help ease students into understanding the notion of resampling from an empirical distribution, instilling confidence in mean and variance estimates without relying on the (often unjustifiable) assumption of normality. He suggests a three-stage approach for guiding students through this transition:

Introduce Monte Carlo simulation as an alternative to distribution theory.
Demonstrate the parametric bootstrap as an alternative to critical value tables.
Abandon the parametric paradigm altogether by generating quantiles and percentile intervals from the resampling distribution. Show when the bootstrap works better or worse than the parametric approach.

Weights and Frequencies

The TTEST procedure does not support the use of the WEIGHT statement with the bootstrap because there is no consensus on weighted bootstrap methods.

The FREQ statement is supported with the bootstrap.

Review of Common Notation and Formulas

Most notation and formulas involved in the descriptions of bootstrap methods in subsequent sections have already been discussed in previous sections, but they are presented here for easier reference. Estimates that involve the empirical distribution derived from the data are newly presented in this section and are denoted as "," sometimes with a subscript to distinguish among alternative assumptions.

Table 8 summarizes the basic notation for each design that is supported in bootstrap methods in PROC TTEST.

Table 8: Common Notation

Symbol	Description
One-Sample Design
n	Number of observations
	Population mean
	Population variance
	Value of ALPHA= option in PROC TTEST statement, such that the confidence level for all bootstrap confidence intervals is 100(1 – )%
	Value of ith observation,
Two-Sample Design
	Number of observations at the first class level
	Number of observations at the second class level
	Value of ith observation at the first class level,
	Value of ith observation at the second class level,
General
	100p percentile of standard normal distribution
	100p percentile of t distribution with degrees of freedom

The standard error of is the standard deviation of its sampling distribution. The degrees of freedom discussed in this section reflect the values that would be used for t tests for the corresponding designs in PROC TTEST.

One-sample estimates for mean (), standard deviation (s), standard error of mean (SE), and degrees of freedom (df) are as follows:

StartLayout 1st Row 1st Column y overbar 2nd Column equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts y Subscript i Baseline 2nd Row 1st Column s 2nd Column equals left-parenthesis StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus y overbar right-parenthesis squared right-parenthesis Superscript one-half Baseline 3rd Row 1st Column normal upper S normal upper E 2nd Column equals s slash StartRoot n EndRoot 4th Row 1st Column normal d normal f 2nd Column equals n minus 1 EndLayout

Two-sample estimates for within-group means ( and ), mean difference (), and within-group standard deviations ( and ) are as follows:

StartLayout 1st Row 1st Column y 1 overbar 2nd Column equals StartFraction 1 Over n 1 EndFraction sigma-summation Underscript i equals 1 Overscript n 1 Endscripts y Subscript 1 i Baseline 2nd Row 1st Column y 2 overbar 2nd Column equals StartFraction 1 Over n 2 EndFraction sigma-summation Underscript i equals 1 Overscript n 2 Endscripts y Subscript 2 i Baseline 3rd Row 1st Column y Subscript d Baseline overbar 2nd Column equals y 1 overbar minus y 2 overbar 4th Row 1st Column s 1 2nd Column equals left-parenthesis StartFraction 1 Over n 1 minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n 1 Endscripts left-parenthesis y Subscript 1 i Baseline minus y 1 overbar right-parenthesis squared right-parenthesis Superscript one-half Baseline 5th Row 1st Column s 2 2nd Column equals left-parenthesis StartFraction 1 Over n 2 minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n 2 Endscripts left-parenthesis y Subscript 2 i Baseline minus y 2 overbar right-parenthesis squared right-parenthesis Superscript one-half EndLayout

Two-sample pooled estimates for standard deviation that is assumed to be common within groups (), standard error of mean difference (), and degrees of freedom () are as follows:

StartLayout 1st Row 1st Column s Subscript p 2nd Column equals left-parenthesis StartFraction left-parenthesis n 1 minus 1 right-parenthesis s 1 squared plus left-parenthesis n 2 minus 1 right-parenthesis s 2 squared Over n 1 plus n 2 minus 2 EndFraction right-parenthesis Superscript one-half Baseline 2nd Row 1st Column normal upper S normal upper E Subscript p 2nd Column equals s Subscript p Baseline left-parenthesis StartFraction 1 Over n 1 EndFraction plus StartFraction 1 Over n 2 EndFraction right-parenthesis Superscript one-half Baseline 3rd Row 1st Column normal d normal f Subscript p 2nd Column equals n 1 plus n 2 minus 2 EndLayout

Note that , , , and are all unbiased estimators of their respective variances.

The two-sample unpooled standard error estimate of the mean difference () and degrees of freedom estimate for unpooled (Satterthwaite) t statistic () are as follows:

StartLayout 1st Row 1st Column normal upper S normal upper E Subscript u 2nd Column equals left-parenthesis StartFraction s 1 squared Over n 1 EndFraction plus StartFraction s 2 squared Over n 2 EndFraction right-parenthesis Superscript one-half Baseline 2nd Row 1st Column normal d normal f Subscript u 2nd Column equals StartStartFraction normal upper S normal upper E Subscript u Superscript 4 Baseline OverOver StartFraction s 1 Superscript 4 Baseline Over left-parenthesis n 1 minus 1 right-parenthesis n 1 squared EndFraction plus StartFraction s 2 Superscript 4 Baseline Over left-parenthesis n 2 minus 1 right-parenthesis n 2 squared EndFraction EndEndFraction EndLayout

The one-sample variance of the empirical distribution () and the standard error of the empirical distribution of the mean (ZE) are as follows:

StartLayout 1st Row 1st Column ModifyingAbove sigma squared With caret 2nd Column equals left-parenthesis StartFraction n minus 1 Over n EndFraction right-parenthesis s squared 2nd Row 1st Column normal upper Z normal upper E 2nd Column equals ModifyingAbove sigma With caret slash StartRoot n EndRoot EndLayout

The two-sample pooled variance of the empirical distribution () and the pooled standard error estimate of the empirical distribution of mean difference () are as follows:

StartLayout 1st Row 1st Column ModifyingAbove sigma Subscript p Superscript 2 Baseline With caret 2nd Column equals StartFraction left-parenthesis n 1 minus 1 right-parenthesis s 1 squared plus left-parenthesis n 2 minus 1 right-parenthesis s 2 squared Over n 1 plus n 2 EndFraction 2nd Row 1st Column normal upper Z normal upper E Subscript p 2nd Column equals ModifyingAbove sigma With caret Subscript p Baseline left-parenthesis StartFraction 1 Over n 1 EndFraction plus StartFraction 1 Over n 2 EndFraction right-parenthesis Superscript one-half EndLayout

The two-sample unpooled standard error of the empirical distribution of mean difference is defined as

normal upper Z normal upper E Subscript u Baseline equals left-parenthesis StartFraction left-parenthesis n 1 minus 1 right-parenthesis s 1 squared Over n 1 squared EndFraction plus StartFraction left-parenthesis n 2 minus 1 right-parenthesis s 2 squared Over n 2 squared EndFraction right-parenthesis Superscript one-half

Resampling

For the nonparametric bootstrap for a one-sample design, a bootstrap sample is a random draw of n observations with replacement from the original data set, where is the statistic that is calculated from a sample of n iid observations (for example, or s), r is the number of independent bootstrap samples, and is the value of for the ith bootstrap sample from the original data, where .

The bootstrap for a paired design is identical to the bootstrap for a one-sample design if is defined as the difference between the first and second members of the ith pair.

In a bootstrap for a two-sample design, random draws of size and are taken with replacement from the first and second groups, respectively, and combined to produce a single bootstrap sample.

Statistics That Are Resampled

The sample estimates for statistics that are supported in bootstrap analyses are computed as follows.

For a one-sample design, the mean is estimated by

y overbar equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts y Subscript i

and the standard deviation is estimated by

s equals left-parenthesis StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus y overbar right-parenthesis squared right-parenthesis Superscript one-half

For a paired design, the mean of the paired difference is estimated by

d overbar equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts d Subscript i

and the standard deviation of the paired difference is estimated by

s Subscript d Baseline equals left-parenthesis StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis d Subscript i Baseline minus d overbar right-parenthesis squared right-parenthesis Superscript one-half

For a two-sample design, the mean of the class difference is estimated by

y overbar Subscript d Baseline equals y overbar Subscript 1 Baseline minus y overbar Subscript 2 Baseline equals StartFraction 1 Over n 1 EndFraction sigma-summation Underscript i equals 1 Overscript n 1 Endscripts y Subscript 1 i Baseline minus StartFraction 1 Over n 2 EndFraction sigma-summation Underscript j equals 1 Overscript n 2 Endscripts y Subscript 2 j

Under the assumption of equal variances (), the pooled estimate of the standard deviation of the class difference is

s Subscript p d Baseline equals StartRoot 2 EndRoot s Subscript p Baseline equals StartRoot 2 EndRoot left-parenthesis StartFraction left-parenthesis n 1 minus 1 right-parenthesis s 1 squared plus left-parenthesis n 2 minus 1 right-parenthesis s 2 squared Over n 1 plus n 2 minus 2 EndFraction right-parenthesis Superscript one-half

Under the assumption of unequal variances, the Satterthwaite estimate of the standard deviation of the class difference is

s Subscript u d Baseline equals StartRoot s 1 squared plus s 2 squared EndRoot

Bootstrap Standard Error, Bias Estimate, and Quantiles

The bootstrap standard error is the sample standard deviation of the bootstrap distribution:

The bootstrap bias estimate is

Several confidence intervals in the next section are based on quantiles of bootstrap samples. Following the convention in Efron and Tibshirani (1993, section 12.5), the quantile for an ambiguous case is chosen as the nearest sample value in the direction toward the center of the bootstrap distribution. This choice ensures that confidence intervals that are constructed from the quantiles satisfy the desired coverage. In particular, the pth quantile (100p percentile) of the bootstrap distribution of (or some function of ) is computed as follows:

If is an integer, then is the th largest value.
Otherwise, if and , then is the Lth largest value.
Otherwise, if and , then is the Uth largest value.
Otherwise (either and , or and ), then is undefined and the bootstrap sample must be larger to yield a valid quantile-based confidence interval.

Bootstrap Confidence Intervals

The bootstrap confidence intervals that PROC TTEST implements are based primarily on recommendations from Hesterberg (2015). The recommendations are based on a combination of educational value and good performance in practice.

See Table 5 for a summary of which parameters are supported for each type of confidence interval.

For the following sections, let denote the estimate of the standard error based on unbiased variance estimates—that is, for a one-sample or paired design, for a pooled analysis for a two-sample design, or for an unpooled analysis for a two-sample design. Similarly, let denote the estimate of the standard error based on the variances of the empirical distribution—that is, , , or . Finally, let denote the degrees of freedom that would be used for t tests for the corresponding designs—that is, , , or .

Normal Interval with Bootstrap Standard Error

Perhaps the most crude confidence interval based on the bootstrap is the normal interval with bootstrap standard error, which is simply the normal-based confidence interval with the usual standard error replaced by the bootstrap standard error:

StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis ModifyingAbove theta With caret minus z Subscript 1 minus alpha slash 2 Baseline s Subscript b Baseline comma ModifyingAbove theta With caret plus z Subscript 1 minus alpha slash 2 Baseline s Subscript b Baseline right-parenthesis comma 2nd Column two hyphen sided 2nd Row 1st Column left-parenthesis negative normal infinity comma ModifyingAbove theta With caret plus z Subscript 1 minus alpha Baseline s Subscript b Baseline right-parenthesis comma 2nd Column lower one hyphen sided 3rd Row 1st Column left-parenthesis ModifyingAbove theta With caret minus z Subscript 1 minus alpha Baseline s Subscript b Baseline comma normal infinity right-parenthesis comma 2nd Column upper one hyphen sided EndLayout

In PROC TTEST, the normal interval with bootstrap standard error is computed only for the mean or mean difference. Standard confidence intervals for standard deviations are based on the chi-square distribution rather than on the normal distribution and thus do not have a bootstrap analog of this type.

Bootstrap Percentile Interval

The bootstrap percentile interval is recommended by Hesterberg (2015) as one of two "quick and dirty" intervals to begin with when introducing students to the bootstrap. Depending on the sidedness, it is the middle, lower, or upper 100(1 – )% of the bootstrap distribution:

StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis q Subscript StartFraction alpha Over 2 EndFraction Baseline comma q Subscript 1 minus StartFraction alpha Over 2 EndFraction Baseline right-parenthesis comma 2nd Column two hyphen sided 2nd Row 1st Column left-parenthesis negative normal infinity comma q Subscript 1 minus alpha Baseline right-parenthesis comma 2nd Column lower one hyphen sided 3rd Row 1st Column left-parenthesis q Subscript alpha Baseline comma normal infinity right-parenthesis comma 2nd Column upper one hyphen sided EndLayout

where q are quantiles of .

This interval is usually the most intuitive one for students. It is robust to skewness in the data, but it performs poorly for small sample sizes. It tends to be too narrow, and it is only "first-order accurate." For a one-sample design, first-order accuracy means that the one-sided coverage probability differs from the nominal value by .

t Interval with Bootstrap Standard Error

The other "quick and dirty interval" is the t interval with bootstrap standard error, which is the traditional t-based confidence interval with the usual standard error replaced by the bootstrap standard error:

StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis ModifyingAbove theta With caret minus t Subscript 1 minus alpha slash 2 comma ModifyingAbove normal d normal f With caret Baseline s Subscript b Baseline comma ModifyingAbove theta With caret plus t Subscript 1 minus alpha slash 2 comma ModifyingAbove normal d normal f With caret Baseline s Subscript b Baseline right-parenthesis comma 2nd Column two hyphen sided 2nd Row 1st Column left-parenthesis negative normal infinity comma ModifyingAbove theta With caret plus t Subscript 1 minus alpha comma ModifyingAbove normal d normal f With caret Baseline s Subscript b Baseline right-parenthesis comma 2nd Column lower one hyphen sided 3rd Row 1st Column left-parenthesis ModifyingAbove theta With caret minus t Subscript 1 minus alpha comma ModifyingAbove normal d normal f With caret Baseline s Subscript b Baseline comma normal infinity right-parenthesis comma 2nd Column upper one hyphen sided EndLayout

This interval is also the same as the normal interval with bootstrap standard error where normal quantiles are replaced by t quantiles.

In PROC TTEST, the t interval with bootstrap standard error is computed only for the mean or mean difference. Standard confidence intervals for standard deviations are based on the chi-square distribution rather than on the t distribution and thus do not have a bootstrap analog of this type.

The t interval with bootstrap standard error can help students learn formula methods. It performs relatively well for small n but is not robust to skewness in the data.

Students can compare percentile and t intervals: if they are similar, then they are both probably acceptable.

Bootstrap Expanded Percentile Interval

Whereas the usual bootstrap percentile interval has coverage properties similar to the normal interval with bootstrap standard error (robustness to skewness notwithstanding), the expanded bootstrap percentile interval alleviates the narrowness bias by "upgrading" the coverage properties to be more like the t interval with bootstrap standard error. The expanded bootstrap percentile is produced by replacing the in the bootstrap percentile interval with the value that solves the equation = , where is the half-width of the normal-based confidence interval that uses the variance of the empirical distribution and is the half-width of the t-based confidence interval that uses the unbiased variance estimate. The half-width of a two-sided interval is the length of the interval divided by two, and the half-width of a one-sided interval is the absolute difference between the point estimate and the finite limit.

The general solution of = is

alpha prime equals d normal upper Phi left-parenthesis StartFraction ModifyingAbove upper S With caret Over ModifyingAbove upper Z With caret EndFraction t Subscript alpha slash d comma ModifyingAbove normal d normal f With caret Baseline right-parenthesis

where d is the number of sides.

The solutions for different designs are as follows:

alpha prime equals StartLayout Enlarged left-brace 1st Row 1st Column d normal upper Phi left-parenthesis StartRoot StartFraction n Over n minus 1 EndFraction EndRoot t Subscript alpha slash d comma n minus 1 Baseline right-parenthesis comma 2nd Column one hyphen sample or paired analysis 2nd Row 1st Column d normal upper Phi left-parenthesis StartRoot StartFraction n 1 plus n 2 Over n 1 plus n 2 minus 2 EndFraction EndRoot t Subscript alpha slash d comma n 1 plus n 2 minus 2 Baseline right-parenthesis comma 2nd Column two hyphen sample pooled analysis 3rd Row 1st Column d normal upper Phi left-parenthesis StartFraction normal upper S normal upper E Subscript u Baseline Over normal upper Z normal upper E Subscript u Baseline EndFraction t Subscript alpha slash d comma normal d normal f Sub Subscript u Subscript Baseline right-parenthesis comma 2nd Column two hyphen sample unpooled analysis EndLayout

The resulting expanded percentile interval for each case is

StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis q Subscript StartFraction alpha prime Over 2 EndFraction Baseline comma q Subscript 1 minus StartFraction alpha prime Over 2 EndFraction Baseline right-parenthesis comma 2nd Column two hyphen sided 2nd Row 1st Column left-parenthesis negative normal infinity comma q Subscript 1 minus alpha Sub Superscript prime Subscript Baseline right-parenthesis comma 2nd Column lower one hyphen sided 3rd Row 1st Column left-parenthesis q Subscript alpha Sub Superscript prime Subscript Baseline comma normal infinity right-parenthesis comma 2nd Column upper one hyphen sided EndLayout

where q are quantiles of .

In PROC TTEST, the bootstrap expanded percentile interval is computed only for the mean or mean difference. Standard confidence intervals for standard deviations are based on the chi-square distribution rather than on the normal or t distributions and thus do not have a bootstrap analog of this type.

The expanded interval is better than the bootstrap percentile interval and the t interval with bootstrap standard error but not as good as the bootstrap t interval, which is described in the following section.

Bootstrap t Interval

The bootstrap t interval eschews the assumption of the t statistic having a t distribution and instead uses quantiles of its bootstrap distribution, along with traditional standard error estimates,

StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis ModifyingAbove theta With caret minus q Subscript 1 minus StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove upper S With caret comma ModifyingAbove theta With caret minus q Subscript StartFraction alpha Over 2 EndFraction Baseline ModifyingAbove upper S With caret right-parenthesis comma 2nd Column two hyphen sided 2nd Row 1st Column left-parenthesis negative normal infinity comma ModifyingAbove theta With caret minus q Subscript alpha Baseline ModifyingAbove upper S With caret right-parenthesis comma 2nd Column lower one hyphen sided 3rd Row 1st Column left-parenthesis ModifyingAbove theta With caret minus q Subscript 1 minus alpha Baseline ModifyingAbove upper S With caret comma normal infinity right-parenthesis comma 2nd Column upper one hyphen sided EndLayout

where q are quantiles of and is the (non-bootstrap) standard error estimate of based on unbiased variance estimates.

In PROC TTEST, the bootstrap t interval is computed only for the mean or mean difference. There is no reasonable general formula for the standard error of the sample standard deviation.

The bootstrap t interval allows for asymmetry and is "second-order accurate," satisfying the following properties for a one-sample design:

difference in one-sided coverage probability from nominal value
robust to bias
robust to skewness
transformation-invariant—that is, intervals for some function of can be obtained by applying the same transformation to the endpoints

Bootstrap Bias-Corrected Percentile Interval

The bootstrap bias-corrected percentile interval (BC) is

StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis q Subscript alpha 1 Baseline comma q Subscript alpha 2 Baseline right-parenthesis comma 2nd Column two hyphen sided 2nd Row 1st Column left-parenthesis negative normal infinity comma q Subscript alpha 3 Baseline right-parenthesis comma 2nd Column lower one hyphen sided 3rd Row 1st Column left-parenthesis q Subscript alpha 4 Baseline comma normal infinity right-parenthesis comma 2nd Column upper one hyphen sided EndLayout

where

and q are quantiles of .

The BC interval is the default bootstrap confidence interval in PROC TTEST (and also in the NLIN and CAUSALTRT procedures). It corrects for median bias, which occurs when the median of the sampling distribution differs from . The two-sided version is given in Efron and Tibshirani (1993, equation 14.10), and the one-sided version is given in Carpenter and Bithell (2000, equation 9).

Last updated: December 09, 2022