The SEQDESIGN Procedure

Applicable Two-Sample Tests and Sample Size Computation

The SEQDESIGN procedure provides sample size computation for two-sample tests: the test for the difference between two normal means, tests for binomial proportions, and the log-rank test for two survival distributions. These tests for binomial proportions include the test for the difference between two binomial proportions, the log odds ratio test for binomial proportions, and the log relative risk test for binomial proportions,

For a test of difference between two sample means, the required sample size depends on the assumed sample variances. Similarly, for a test of two-sample proportions, the required sample size depends on the assumed sample proportions. For a log-rank test of two survival distributions, the required sample size depends on the assumed sample hazard rates, accrual rate, and accrual time.

If the REF=NULLPROP or REF=NULLHAZARD option is specified, the proportions or hazard rates under the null hypothesis are used to derive the required sample size or number of events. Otherwise, the REF=PROP option (which is the default in the MODEL=TWOSAMPLEFREQ option) or the REF=HAZARD option (which is the default in the MODEL=TWOSAMPLESURVIVAL option) uses proportions or hazard rates under the alternative hypothesis to derive the required sample size or number of events.

Test for the Difference between Two Normal Means

The MODEL=TWOSAMPLEMEAN option in the SAMPLESIZE statement derives the sample size required to test the difference between the means of two normal populations mu Subscript a and mu Subscript b by using the null hypothesis upper H 0 colon theta equals 0, where theta equals mu Subscript a Baseline minus mu Subscript b.

At stage k, the MLE for theta is computed as

ModifyingAbove theta With caret Subscript k Baseline equals y overbar Subscript a k Baseline minus y overbar Subscript b k Baseline equals StartFraction 1 Over upper N Subscript a k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript a k Baseline Endscripts y Subscript a k j Baseline minus StartFraction 1 Over upper N Subscript b k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript b k Baseline Endscripts y Subscript b k j

where y Subscript a k j and y Subscript b k j are the values of the jth observation available in the kth stage groups A and B, respectively, and upper N Subscript a k and upper N Subscript b k are the cumulative sample sizes at stage k for these two groups.

The statistic ModifyingAbove theta With caret Subscript k has a normal distribution

ModifyingAbove theta With caret Subscript k Baseline tilde upper N left-parenthesis theta comma upper I Subscript k Baseline Superscript negative 1 Baseline right-parenthesis

where the information upper I Subscript k is the inverse of the variance normal upper V normal a normal r left-parenthesis ModifyingAbove theta With caret Subscript k Baseline right-parenthesis equals sigma Subscript a Superscript 2 Baseline slash upper N Subscript a k Baseline plus sigma Subscript b Superscript 2 Baseline slash upper N Subscript b k.

Then the standardized statistic

upper Z Subscript k Baseline equals ModifyingAbove theta With caret Subscript k Baseline StartRoot upper I Subscript k Baseline EndRoot tilde upper N left-parenthesis theta StartRoot upper I Subscript k Baseline EndRoot comma 1 right-parenthesis

Thus, to test the hypothesis upper H 0 colon theta equals 0 against an upper alternative upper H 1 colon theta equals theta 1 comma theta 1 greater-than 0, upper H 0 is rejected at stage k if the statistic upper Z Subscript k Baseline greater-than-or-equal-to a Subscript k, the upper alpha boundary for the standardized Z statistic at stage k.

If the variances sigma Subscript a Superscript 2 and sigma Subscript b Superscript 2 are unknown, the sample variances can be used to derive the information upper I Subscript k if it is assumed that each sample variance is computed from a large sample such that the test statistic has an approximately normal distribution.

The maximum information is needed to derive the required sample size. If the maximum information is not specified or derived in the procedure, the alternative reference theta 1 Superscript asterisk specified in the MEANDIFF option is used to derive the maximum information.

Note that in order to derive the sample sizes upper N Subscript a k and upper N Subscript b k uniquely from the information, upper N Subscript a k Baseline equals upper R upper N Subscript b k is assumed for k equals 1 comma 2 comma ellipsis comma upper K, where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio computed from the WEIGHT=w Subscript a w Subscript b option in the SAMPLESIZE statement.

In PROC SEQDESIGN, the computed total sample sizes for the two groups are

upper N Subscript a upper K Baseline equals left-parenthesis sigma Subscript a Superscript 2 Baseline plus upper R sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X Baseline equals upper R left-parenthesis StartFraction sigma Subscript a Superscript 2 Baseline Over upper R EndFraction plus sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X
upper N Subscript b upper K Baseline equals left-parenthesis StartFraction sigma Subscript a Superscript 2 Baseline Over upper R EndFraction plus sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X

where upper I Subscript upper X is the maximum information derived in the SEQDESIGN procedure, R is the constant allocation ratio, and sigma Subscript a and sigma Subscript b are the specified standard deviations.

For upper R equals 1, the two sample sizes are equal, then

upper N Subscript a upper K Baseline equals upper N Subscript b upper K Baseline equals StartFraction upper N Subscript upper K Baseline Over 2 EndFraction equals left-parenthesis sigma Subscript a Superscript 2 Baseline plus sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X

If the variances from the two groups are equal, sigma Subscript a Superscript 2 Baseline equals sigma Subscript b Superscript 2 Baseline equals sigma squared, then the total sample sizes for the two groups are

upper N Subscript a upper K Baseline equals left-parenthesis 1 plus upper R right-parenthesis sigma squared upper I Subscript upper X
upper N Subscript b upper K Baseline equals left-parenthesis 1 plus StartFraction 1 Over upper R EndFraction right-parenthesis sigma squared upper I Subscript upper X

and the total sample size is

upper N Subscript upper X Baseline equals upper N Subscript a upper K Baseline plus upper N Subscript b upper K Baseline equals StartFraction left-parenthesis upper R plus 1 right-parenthesis squared Over upper R EndFraction sigma squared upper I Subscript upper X

Furthermore, for upper R equals 1, the two sample sizes are equal, then

upper N Subscript a upper K Baseline equals upper N Subscript b upper K Baseline equals StartFraction upper N Subscript upper X Baseline Over 2 EndFraction equals 2 sigma squared upper I Subscript upper X

With an available maximum information, you can specify the MODEL=TWOSAMPLEMEAN( WEIGHT= R   STDDEV= sigma Subscript a Baseline sigma Subscript b) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC GLM can be used to derive the two-sample Z test for the mean difference.

Test for the Difference between Two Binomial Proportions

The MODEL=TWOSAMPLEFREQ(TEST=PROP) option in the SAMPLESIZE statement derives the sample size required to test the difference between two binomial populations with upper H 0 colon theta equals 0, where theta equals p Subscript a Baseline minus p Subscript b. At stage k, the MLE for theta is

ModifyingAbove theta With caret Subscript k Baseline equals ModifyingAbove p With caret Subscript a k Baseline minus ModifyingAbove p With caret Subscript b k Baseline equals StartFraction 1 Over upper N Subscript a k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript a k Baseline Endscripts y Subscript a k j Baseline minus StartFraction 1 Over upper N Subscript b k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript b k Baseline Endscripts y Subscript b k j

where y Subscript a k j and y Subscript b k j are the values of the jth observation available in the kth stage for groups A and B, respectively, and upper N Subscript a k and upper N Subscript b k are the cumulative sample sizes at stage k for these two groups.

For sufficiently large sample sizes upper N Subscript a k and upper N Subscript b k, the statistic ModifyingAbove theta With caret Subscript k has an approximate normal distribution

ModifyingAbove theta With caret Subscript k Baseline tilde upper N left-parenthesis theta comma upper I Subscript k Baseline Superscript negative 1 Baseline right-parenthesis

where the information is the inverse of the variance

normal upper V normal a normal r left-parenthesis ModifyingAbove theta With caret Subscript k Baseline right-parenthesis equals StartFraction p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis Over upper N Subscript a k Baseline EndFraction plus StartFraction p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis Over upper N Subscript b k Baseline EndFraction

Thus, the standardized statistic

upper Z Subscript k Baseline equals ModifyingAbove theta With caret Subscript k Baseline StartRoot upper I Subscript k Baseline EndRoot tilde upper N left-parenthesis theta StartRoot upper I Subscript k Baseline EndRoot comma 1 right-parenthesis

In practice, p Subscript a Baseline equals ModifyingAbove p With caret Subscript a and p Subscript b Baseline equals ModifyingAbove p With caret Subscript b, the estimated sample proportions for groups A and B, respectively, at stage k, can be used to derive the information upper I Subscript k and the test statistic upper Z Subscript k. Thus, to test the hypothesis upper H 0 against an upper alternative upper H 1 colon theta greater-than 0, upper H 0 is rejected at stage k if the statistic upper Z Subscript k Baseline greater-than-or-equal-to a Subscript k, the upper alpha boundary for the standardized Z statistic at stage k.

The maximum information upper I Subscript upper X is needed to derive the required sample size. If the maximum information is not specified or derived with the ALTREF= option in the procedure, the PROP= option in the SAMPLESIZE statement is used to provide proportions under the alternative hypothesis for the alternative reference and then to derive the maximum information.

The proportions in the two groups are needed to derive the sample size. Also, in order to derive the sample sizes upper N Subscript a k and upper N Subscript b k uniquely from the information, upper N Subscript a k Baseline equals upper R upper N Subscript b k is assumed for k equals 1 comma 2 comma ellipsis comma upper K, where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio computed from the WEIGHT=w Subscript a w Subscript b option in the SAMPLESIZE statement. Then

upper I Subscript upper X Baseline equals left-parenthesis StartFraction p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis Over upper N Subscript a upper K Baseline EndFraction plus StartFraction p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis Over upper N Subscript b upper K Baseline EndFraction right-parenthesis Superscript negative 1 Baseline equals StartFraction upper N Subscript a upper K Baseline Over p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis plus upper R p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction

In PROC SEQDESIGN, the total sample sizes in the two groups are computed as

upper N Subscript a upper K Baseline equals left-parenthesis p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis plus upper R p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis right-parenthesis upper I Subscript upper X
upper N Subscript b upper K Baseline equals StartFraction 1 Over upper R EndFraction upper N Subscript a upper K

where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio, and p Subscript a Superscript asterisk and p Subscript b Superscript asterisk are proportions specified with the REF= option:

  • REF=NULLPROP uses proportions under upper H 0: p Subscript a Superscript asterisk Baseline equals p Subscript 0 a, p Subscript b Superscript asterisk Baseline equals p Subscript 0 b

  • REF=AVGNULLPROP uses the average proportion under upper H 0: p Subscript a Superscript asterisk Baseline equals p Subscript b Superscript asterisk Baseline equals left-parenthesis upper R p Subscript 0 a Baseline plus p Subscript 0 b Baseline right-parenthesis slash left-parenthesis upper R plus 1 right-parenthesis

  • REF=PROP uses proportions under upper H 1: p Subscript a Superscript asterisk Baseline equals p Subscript 1 a, p Subscript b Superscript asterisk Baseline equals p Subscript 1 b

  • REF=AVGPROP uses the average proportion under upper H 1 colon p Subscript a Superscript asterisk Baseline equals p Subscript b Superscript asterisk Baseline equals left-parenthesis upper R p Subscript 1 a Baseline plus p Subscript 1 b Baseline right-parenthesis slash left-parenthesis upper R plus 1 right-parenthesis

The total sample size is given by

upper N Subscript upper X Baseline equals upper N Subscript a upper K Baseline plus upper N Subscript b upper K Baseline equals left-parenthesis upper R plus 1 right-parenthesis left-parenthesis StartFraction 1 Over upper R EndFraction p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis plus p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis right-parenthesis upper I Subscript upper X

For upper R equals 1, the two sample sizes are equal,

upper N Subscript a upper K Baseline equals upper N Subscript b upper K Baseline equals StartFraction upper N Subscript upper X Baseline Over 2 EndFraction equals left-parenthesis p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis plus p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis right-parenthesis upper I Subscript upper X

You can specify the MODEL=TWOSAMPLEFREQ( TEST=PROP WEIGHT=R ) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC GENMOD with the default DIST=NORMAL option in the MODEL statement can be used to derive the two-sample Z test for proportion difference.

Test for Two Binomial Proportions with a Log Odds Ratio Statistic

The MODEL=TWOSAMPLEFREQ(TEST=LOGOR) option in the SAMPLESIZE statement derives the sample size required to test two binomial proportions by using a log odds ratio statistic. The odds ratio is the ratio of the odds in one group to the odds in the other group, and the log odds ratio is the logarithm of the odds ratio

theta equals normal l normal o normal g left-parenthesis StartFraction p Subscript a Baseline slash left-parenthesis 1 minus p Subscript a Baseline right-parenthesis Over p Subscript b Baseline slash left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction right-parenthesis equals normal l normal o normal g left-parenthesis StartFraction p Subscript a Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis Over p Subscript b Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis EndFraction right-parenthesis

The hypothesis of no difference between two proportions, p Subscript a Baseline equals p Subscript b, can be tested through the null hypothesis upper H 0 colon theta equals 0, where theta is the log odds ratio. For example, with upper H 0 colon p Subscript a Baseline equals p Subscript b Baseline equals 0.6 and upper H 1 colon p Subscript a Baseline equals 0.8 comma p Subscript b Baseline equals 0.6, it corresponds to the equivalent hypothesis upper H 0 colon theta equals 0 and upper H 1 colon theta equals normal l normal o normal g left-parenthesis StartFraction 0.8 left-parenthesis 1 minus 0.6 right-parenthesis Over 0.6 left-parenthesis 1 minus 0.8 right-parenthesis EndFraction right-parenthesis equals normal l normal o normal g left-parenthesis 8 slash 3 right-parenthesis equals 0.98083.

The maximum likelihood estimate of theta is given by

ModifyingAbove theta With caret equals normal l normal o normal g left-parenthesis StartFraction ModifyingAbove p With caret Subscript a Baseline left-parenthesis 1 minus ModifyingAbove p With caret Subscript b Baseline right-parenthesis Over ModifyingAbove p With caret Subscript b Baseline left-parenthesis 1 minus ModifyingAbove p With caret Subscript a Baseline right-parenthesis EndFraction right-parenthesis

with an asymptotic variance

normal upper V normal a normal r left-parenthesis ModifyingAbove theta With caret right-parenthesis equals upper I Superscript negative 1 Baseline equals StartFraction 1 Over upper N Subscript a Baseline p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis EndFraction plus StartFraction 1 Over upper N Subscript b Baseline p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction

where I is the information (Diggle et al. 2002, pp. 341–342). That is, the standardized statistic

upper Z Subscript k Baseline equals ModifyingAbove theta With caret Subscript k Baseline StartRoot upper I Subscript k Baseline EndRoot tilde upper N left-parenthesis theta StartRoot upper I Subscript k Baseline EndRoot comma 1 right-parenthesis

In practice, p Subscript a Baseline equals ModifyingAbove p With caret Subscript a and p Subscript b Baseline equals ModifyingAbove p With caret Subscript b, the estimated sample proportions for groups A and B, respectively, at stage k, can be used to derive the information upper I Subscript k and the test statistic upper Z Subscript k Baseline equals ModifyingAbove theta With caret Subscript k Baseline StartRoot upper I Subscript k Baseline EndRoot if the two sample sizes upper N Subscript a and upper N Subscript b are sufficiently large such that the test statistic has an approximately normal distribution.

The maximum information upper I Subscript upper X is needed to derive the required sample size. If the maximum information is not specified or derived with the ALTREF= option in the procedure, the PROP= option in the SAMPLESIZE statement is used to provide proportions under the alternative hypothesis for the alternative reference and then to derive the maximum information.

In order to derive the sample sizes upper N Subscript a k and upper N Subscript b k uniquely from the information, upper N Subscript a k Baseline equals upper R upper N Subscript b k is assumed for k equals 1 comma 2 comma ellipsis comma upper K, where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio computed from the WEIGHT=w Subscript a w Subscript b option in the SAMPLESIZE statement. Then with

upper I Subscript upper X Baseline equals upper N Subscript b upper K Baseline left-parenthesis StartFraction 1 Over upper R p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis EndFraction plus StartFraction 1 Over p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction right-parenthesis Superscript negative 1

the sample size can be computed.

In PROC SEQDESIGN, the total sample sizes in the two groups are computed as

upper N Subscript b upper K Baseline equals upper I Subscript upper X Baseline left-parenthesis StartFraction 1 Over upper R p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis EndFraction plus StartFraction 1 Over p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis EndFraction right-parenthesis
upper N Subscript a upper K Baseline equals upper R upper N Subscript b upper K

where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio, and p Subscript a Superscript asterisk and p Subscript b Superscript asterisk are proportions specified with the REF= option:

  • REF=NULLPROP uses proportions under upper H 0: p Subscript a Superscript asterisk Baseline equals p Subscript 0 a, p Subscript b Superscript asterisk Baseline equals p Subscript 0 b

  • REF=AVGNULLPROP uses the average proportion under upper H 0: p Subscript a Superscript asterisk Baseline equals p Subscript b Superscript asterisk Baseline equals left-parenthesis upper R p Subscript 0 a Baseline plus p Subscript 0 b Baseline right-parenthesis slash left-parenthesis upper R plus 1 right-parenthesis

  • REF=PROP uses proportions under upper H 1: p Subscript a Superscript asterisk Baseline equals p Subscript 1 a, p Subscript b Superscript asterisk Baseline equals p Subscript 1 b

  • REF=AVGPROP uses the average proportion under upper H 1 colon p Subscript a Superscript asterisk Baseline equals p Subscript b Superscript asterisk Baseline equals left-parenthesis upper R p Subscript 1 a Baseline plus p Subscript 1 b Baseline right-parenthesis slash left-parenthesis upper R plus 1 right-parenthesis

You can specify the MODEL=TWOSAMPLEFREQ( TEST=LOGOR WEIGHT=R) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC LOGISTIC can be used to derive the log odds ratio statistic.

Test for Two Binomial Proportions with a Log Relative Risk Statistic

The MODEL=TWOSAMPLEFREQ(TEST=LOGRR) option in the SAMPLESIZE statement derives the sample size required to test two binomial proportions by using a log relative risk statistic. The relative risk is the ratio of the proportion in one group to the proportion in the other group. The log relative risk statistic is the logarithm of the relative risk

theta equals normal l normal o normal g left-parenthesis StartFraction p Subscript a Baseline Over p Subscript b Baseline EndFraction right-parenthesis

The hypothesis of no difference between two proportions, p Subscript a Baseline equals p Subscript b, can be tested through the null hypothesis upper H 0 colon theta equals 0. For example, with upper H 0 colon p Subscript a Baseline equals p Subscript b Baseline equals 0.6 and upper H 1 colon p Subscript a Baseline equals 0.8 comma p Subscript b Baseline equals 0.6, it corresponds to the equivalent hypothesis upper H 0 colon theta equals 0 and upper H 1 colon theta equals normal l normal o normal g left-parenthesis StartFraction 0.8 Over 0.6 EndFraction right-parenthesis equals normal l normal o normal g left-parenthesis 4 slash 3 right-parenthesis equals 0.28768.

The maximum likelihood estimate of theta is given by

ModifyingAbove theta With caret equals normal l normal o normal g left-parenthesis StartFraction ModifyingAbove p With caret Subscript a Baseline Over ModifyingAbove p With caret Subscript b Baseline EndFraction right-parenthesis

with an asymptotic variance

upper I Superscript negative 1 Baseline equals StartFraction 1 minus p Subscript a Baseline Over upper N Subscript a Baseline p Subscript a Baseline EndFraction plus StartFraction 1 minus p Subscript b Baseline Over upper N Subscript b Baseline p Subscript b Baseline EndFraction

where I is the information (Chow and Liu 1998, p. 329).

In practice, p Subscript a Baseline equals ModifyingAbove p With caret Subscript a and p Subscript b Baseline equals ModifyingAbove p With caret Subscript b, the estimated sample proportions for groups A and B, respectively, at stage k, are used to derive the information upper I Subscript k and the test statistic upper Z Subscript k Baseline equals ModifyingAbove theta With caret Subscript k Baseline StartRoot upper I Subscript k Baseline EndRoot.

The maximum information upper I Subscript upper X and proportions p Subscript a and p Subscript b are needed to derive the required sample size. If the maximum information is not specified or derived with the ALTREF= option in the procedure, the PROP= option in the SAMPLESIZE statement is used to provide proportions under the alternative hypothesis for the alternative reference and then to derive the maximum information.

Note that in order to derive the sample sizes upper N Subscript a k and upper N Subscript b k uniquely from the information, upper N Subscript a k Baseline equals upper R upper N Subscript b k is assumed for k equals 1 comma 2 comma ellipsis comma upper K, where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio computed from the WEIGHT=w Subscript a w Subscript b option in the SAMPLESIZE statement. Then the sample size can be computed from

upper I Subscript upper X Baseline equals upper N Subscript b upper K Baseline left-parenthesis StartFraction 1 minus p Subscript a Baseline Over upper R p Subscript a Baseline EndFraction plus StartFraction 1 minus p Subscript b Baseline Over p Subscript b Baseline EndFraction right-parenthesis Superscript negative 1

In PROC SEQDESIGN, the computed sample sizes in the two groups are

upper N Subscript b upper K Baseline equals upper I Subscript upper X Baseline left-parenthesis StartFraction 1 minus p Subscript a Superscript asterisk Baseline Over upper R p Subscript a Superscript asterisk Baseline EndFraction plus StartFraction 1 minus p Subscript b Superscript asterisk Baseline Over p Subscript b Superscript asterisk Baseline EndFraction right-parenthesis
upper N Subscript a upper K Baseline equals upper R upper N Subscript b upper K

where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio, and p Subscript a Superscript asterisk and p Subscript b Superscript asterisk are proportions specified with the REF= option:

  • REF=NULLPROP uses proportions under upper H 0: p Subscript a Superscript asterisk Baseline equals p Subscript 0 a, p Subscript b Superscript asterisk Baseline equals p Subscript 0 b

  • REF=AVGNULLPROP uses the average proportion under upper H 0: p Subscript a Superscript asterisk Baseline equals p Subscript b Superscript asterisk Baseline equals left-parenthesis upper R p Subscript 0 a Baseline plus p Subscript 0 b Baseline right-parenthesis slash left-parenthesis upper R plus 1 right-parenthesis

  • REF=PROP uses proportions under upper H 1: p Subscript a Superscript asterisk Baseline equals p Subscript 1 a, p Subscript b Superscript asterisk Baseline equals p Subscript 1 b

  • REF=AVGPROP uses the average proportion under upper H 1 colon p Subscript a Superscript asterisk Baseline equals p Subscript b Superscript asterisk Baseline equals left-parenthesis upper R p Subscript 1 a Baseline plus p Subscript 1 b Baseline right-parenthesis slash left-parenthesis upper R plus 1 right-parenthesis

You can specify the MODEL=TWOSAMPLEFREQ( TEST=LOGRR WEIGHT=R) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC LOGISTIC can be used to derive the log relative risk statistic.

Test for Two Survival Distributions with a Log-Rank Test

The MODEL=TWOSAMPLESURV option in the SAMPLESIZE statement derives the number of events required for a log-rank test of two survival distributions. The analysis of survival data involves the survival times for both censored and uncensored data. A noncensored survival time is the time from treatment to an event such as remission or relapse for an individual. A censored survival time is the time from treatment to the time of analysis for an individual surviving at that time, and the status is unknown beyond that time.

PROC SEQDESIGN computes the number of events by using the approximations described in Schoenfeld (1981). It differs from PROC POWER, which uses an approach similar to that of Freedman (1982). Thus, even in the case of a one-stage design, the designs that PROC SEQDESIGN and PROC POWER produce do not match exactly. For a comparison of these approximation methods, see Hsieh (1992).

Let T be the random variable of the survival time. Then the survival function

upper S left-parenthesis t right-parenthesis equals probability left-parenthesis upper T greater-than t right-parenthesis

is the probability that an individual from the population has a survival time that exceeds t. And the hazard function is given by

h left-parenthesis t right-parenthesis equals StartFraction f left-parenthesis t right-parenthesis Over upper S left-parenthesis t right-parenthesis EndFraction

where f left-parenthesis t right-parenthesis is the density function of T.

The hazard functions can be used to test the equality of two survival distributions upper S Subscript a Baseline left-parenthesis t right-parenthesis equals upper S Subscript b Baseline left-parenthesis t right-parenthesis with the null hypothesis upper H 0 colon h Subscript a Baseline left-parenthesis t right-parenthesis equals h Subscript b Baseline left-parenthesis t right-parenthesis comma t greater-than 0, where upper S Subscript a Baseline left-parenthesis t right-parenthesis and upper S Subscript b Baseline left-parenthesis t right-parenthesis are survival functions for groups A and B, respectively, and h Subscript a Baseline left-parenthesis t right-parenthesis and h Subscript b Baseline left-parenthesis t right-parenthesis are the corresponding hazard functions.

If the two hazards are proportional, h Subscript a Baseline left-parenthesis t right-parenthesis equals lamda h Subscript b Baseline left-parenthesis t right-parenthesis, where lamda is a constant, then an equivalent null hypothesis is

upper H 0 colon lamda equals StartFraction h Subscript a Baseline left-parenthesis t right-parenthesis Over h Subscript b Baseline left-parenthesis t right-parenthesis EndFraction equals 1

Alternatively, another equivalent null hypothesis is given by

upper H 0 colon theta equals minus normal l normal o normal g left-parenthesis lamda right-parenthesis equals 0

Suppose that the hazard rate h is a constant. Then with a specified median survival time upper T Subscript m, the hazard rate can be derived from the equation

e Superscript minus h upper T Super Subscript m Baseline equals one-half

Denote the distinct event times at stage k as tau Subscript k j Baseline comma j equals 1 comma 2 comma ellipsis comma t Subscript k Baseline, where t Subscript k is the total number of distinct event times. Then the score statistic is the log-rank statistic (Jennison and Turnbull 2000, pp. 259–261; Whitehead 1997, pp. 36–39)

upper S Subscript k Baseline equals sigma-summation Underscript j equals 1 Overscript t Subscript k Baseline Endscripts left-parenthesis d Subscript a k j Baseline minus e Subscript a k j Baseline right-parenthesis

where d Subscript a k j is the number of events from group A and e Subscript a k j is the number of expected events from A. The number of expected events from sans-serif upper A is computed as

e Subscript a k j Baseline equals d Subscript k j Baseline StartFraction r Subscript a k j Baseline Over r Subscript k j Baseline EndFraction

where d Subscript k j is the number of events from both groups, r Subscript a k j is the number of individuals from the treatment group who survived up to time tau Subscript k j, and r Subscript k j is the number of individuals from both groups who survived up to time tau Subscript k j.

If the number of events d Subscript k j is small relative to r Subscript k j, the number of individuals survived up to time tau Subscript k j, then with a sufficiently large sample size, upper S Subscript k has an approximately normal distribution

upper S Subscript k Baseline tilde upper N left-parenthesis theta upper I Subscript k Baseline comma upper I Subscript k Baseline right-parenthesis

where the variance of upper S Subscript k is the estimated information

upper I Subscript k Baseline equals sigma-summation Underscript j equals 1 Overscript t Subscript k Baseline Endscripts StartFraction r Subscript a k j Baseline r Subscript b k j Baseline d Subscript k j Baseline Over r Subscript k j Superscript 2 Baseline EndFraction

In order to derive the number of events from the information upper I Subscript k, upper N Subscript a k Baseline equals upper R upper N Subscript b k is assumed for k equals 1 comma 2 comma ellipsis comma upper K, where upper R equals w Subscript a Baseline slash w Subscript b is the constant allocation ratio computed from the WEIGHT=w Subscript a w Subscript b option in the SAMPLESIZE statement.

The maximum information upper I Subscript upper X is needed to derive the required sample size. If the maximum information is specified or derived with the ALTREF= option in the procedure, the HAZARD=, MEDSURVTIME=, and HAZARDRATIO= options are not applicable. Otherwise, the HAZARD=, MEDSURVTIME=, or HAZARDRATIO= option is used to compute the alternative reference and then to derive the maximum information for the sample size calculation.

With upper N Subscript a upper K Baseline equals upper R upper N Subscript b upper K, if the number of events is few relative to the number of individuals who survived, then r Subscript a upper K j Baseline almost-equals upper R r Subscript b upper K j, and

upper I Subscript upper X Baseline almost-equals sigma-summation Underscript j equals 1 Overscript t Subscript upper K Baseline Endscripts StartFraction upper R Over left-parenthesis upper R plus 1 right-parenthesis squared EndFraction d Subscript upper K j Baseline equals StartFraction upper R Over left-parenthesis upper R plus 1 right-parenthesis squared EndFraction upper D Subscript upper X

where upper D Subscript upper X is the total number of events.

Thus, the required total number of events

upper D Subscript upper X Baseline equals StartFraction left-parenthesis upper R plus 1 right-parenthesis squared Over upper R EndFraction upper I Subscript upper X

For a study group, if the hazard rate is constant, corresponding to an exponential survival distribution, and the individual accrual is uniform in the accrual time upper T Subscript a with a constant accrual rate r Subscript a, then the required total sample size and sample size at each stage can be derived. See the section Input Number of Events for Fixed-Sample Design for a detailed description of the sample size computation that uses hazard rates, accrual rate, and accrual time.

You can specify the MODEL=TWOSAMPLESURVIVAL option in the SAMPLESIZE statement to compute the required total number of events and individual number of events at each stage. With the specifications of hazard rates, accrual rate, and accrual time, the required total sample size and individual sample size at each stage can also be derived. If the REF=NULLHAZARD option is specified, the hazard rates under the null hypothesis, h Subscript 0 a and h Subscript 0 b, are used in the sample size computation. Otherwise, the hazard rates under the alternative hypothesis, h Subscript 1 a and h Subscript 1 b, are used. A procedure such as PROC LIFETEST can be used to derive the log-rank statistic.

Last updated: December 09, 2022