The SEQDESIGN Procedure

Applicable Two-Sample Tests and Sample Size Computation

The SEQDESIGN procedure provides sample size computation for two-sample tests: the test for the difference between two normal means, tests for binomial proportions, and the log-rank test for two survival distributions. These tests for binomial proportions include the test for the difference between two binomial proportions, the log odds ratio test for binomial proportions, and the log relative risk test for binomial proportions,

For a test of difference between two sample means, the required sample size depends on the assumed sample variances. Similarly, for a test of two-sample proportions, the required sample size depends on the assumed sample proportions. For a log-rank test of two survival distributions, the required sample size depends on the assumed sample hazard rates, accrual rate, and accrual time.

If the REF=NULLPROP or REF=NULLHAZARD option is specified, the proportions or hazard rates under the null hypothesis are used to derive the required sample size or number of events. Otherwise, the REF=PROP option (which is the default in the MODEL=TWOSAMPLEFREQ option) or the REF=HAZARD option (which is the default in the MODEL=TWOSAMPLESURVIVAL option) uses proportions or hazard rates under the alternative hypothesis to derive the required sample size or number of events.

Test for the Difference between Two Normal Means

The MODEL=TWOSAMPLEMEAN option in the SAMPLESIZE statement derives the sample size required to test the difference between the means of two normal populations and by using the null hypothesis , where .

At stage k, the MLE for is computed as

ModifyingAbove theta With caret Subscript k Baseline equals y overbar Subscript a k Baseline minus y overbar Subscript b k Baseline equals StartFraction 1 Over upper N Subscript a k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript a k Baseline Endscripts y Subscript a k j Baseline minus StartFraction 1 Over upper N Subscript b k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript b k Baseline Endscripts y Subscript b k j

where and are the values of the jth observation available in the kth stage groups A and B, respectively, and and are the cumulative sample sizes at stage k for these two groups.

The statistic has a normal distribution

ModifyingAbove theta With caret Subscript k Baseline tilde upper N left-parenthesis theta comma upper I Subscript k Baseline Superscript negative 1 Baseline right-parenthesis

where the information is the inverse of the variance .

Then the standardized statistic

upper Z Subscript k Baseline equals ModifyingAbove theta With caret Subscript k Baseline StartRoot upper I Subscript k Baseline EndRoot tilde upper N left-parenthesis theta StartRoot upper I Subscript k Baseline EndRoot comma 1 right-parenthesis

Thus, to test the hypothesis against an upper alternative , is rejected at stage k if the statistic , the upper boundary for the standardized Z statistic at stage k.

If the variances and are unknown, the sample variances can be used to derive the information if it is assumed that each sample variance is computed from a large sample such that the test statistic has an approximately normal distribution.

The maximum information is needed to derive the required sample size. If the maximum information is not specified or derived in the procedure, the alternative reference specified in the MEANDIFF option is used to derive the maximum information.

Note that in order to derive the sample sizes and uniquely from the information, is assumed for , where is the constant allocation ratio computed from the WEIGHT= option in the SAMPLESIZE statement.

In PROC SEQDESIGN, the computed total sample sizes for the two groups are

upper N Subscript a upper K Baseline equals left-parenthesis sigma Subscript a Superscript 2 Baseline plus upper R sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X Baseline equals upper R left-parenthesis StartFraction sigma Subscript a Superscript 2 Baseline Over upper R EndFraction plus sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X

upper N Subscript b upper K Baseline equals left-parenthesis StartFraction sigma Subscript a Superscript 2 Baseline Over upper R EndFraction plus sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X

where is the maximum information derived in the SEQDESIGN procedure, R is the constant allocation ratio, and and are the specified standard deviations.

For , the two sample sizes are equal, then

upper N Subscript a upper K Baseline equals upper N Subscript b upper K Baseline equals StartFraction upper N Subscript upper K Baseline Over 2 EndFraction equals left-parenthesis sigma Subscript a Superscript 2 Baseline plus sigma Subscript b Superscript 2 Baseline right-parenthesis upper I Subscript upper X

If the variances from the two groups are equal, , then the total sample sizes for the two groups are

upper N Subscript a upper K Baseline equals left-parenthesis 1 plus upper R right-parenthesis sigma squared upper I Subscript upper X

upper N Subscript b upper K Baseline equals left-parenthesis 1 plus StartFraction 1 Over upper R EndFraction right-parenthesis sigma squared upper I Subscript upper X

and the total sample size is

upper N Subscript upper X Baseline equals upper N Subscript a upper K Baseline plus upper N Subscript b upper K Baseline equals StartFraction left-parenthesis upper R plus 1 right-parenthesis squared Over upper R EndFraction sigma squared upper I Subscript upper X

Furthermore, for , the two sample sizes are equal, then

upper N Subscript a upper K Baseline equals upper N Subscript b upper K Baseline equals StartFraction upper N Subscript upper X Baseline Over 2 EndFraction equals 2 sigma squared upper I Subscript upper X

With an available maximum information, you can specify the MODEL=TWOSAMPLEMEAN( WEIGHT= R STDDEV= ) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC GLM can be used to derive the two-sample Z test for the mean difference.

Test for the Difference between Two Binomial Proportions

The MODEL=TWOSAMPLEFREQ(TEST=PROP) option in the SAMPLESIZE statement derives the sample size required to test the difference between two binomial populations with , where . At stage k, the MLE for is

ModifyingAbove theta With caret Subscript k Baseline equals ModifyingAbove p With caret Subscript a k Baseline minus ModifyingAbove p With caret Subscript b k Baseline equals StartFraction 1 Over upper N Subscript a k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript a k Baseline Endscripts y Subscript a k j Baseline minus StartFraction 1 Over upper N Subscript b k Baseline EndFraction sigma-summation Underscript j equals 1 Overscript upper N Subscript b k Baseline Endscripts y Subscript b k j

where and are the values of the jth observation available in the kth stage for groups A and B, respectively, and and are the cumulative sample sizes at stage k for these two groups.

For sufficiently large sample sizes and , the statistic has an approximate normal distribution

where the information is the inverse of the variance

normal upper V normal a normal r left-parenthesis ModifyingAbove theta With caret Subscript k Baseline right-parenthesis equals StartFraction p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis Over upper N Subscript a k Baseline EndFraction plus StartFraction p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis Over upper N Subscript b k Baseline EndFraction

Thus, the standardized statistic

In practice, and , the estimated sample proportions for groups A and B, respectively, at stage k, can be used to derive the information and the test statistic . Thus, to test the hypothesis against an upper alternative , is rejected at stage k if the statistic , the upper boundary for the standardized Z statistic at stage k.

The maximum information is needed to derive the required sample size. If the maximum information is not specified or derived with the ALTREF= option in the procedure, the PROP= option in the SAMPLESIZE statement is used to provide proportions under the alternative hypothesis for the alternative reference and then to derive the maximum information.

The proportions in the two groups are needed to derive the sample size. Also, in order to derive the sample sizes and uniquely from the information, is assumed for , where is the constant allocation ratio computed from the WEIGHT= option in the SAMPLESIZE statement. Then

upper I Subscript upper X Baseline equals left-parenthesis StartFraction p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis Over upper N Subscript a upper K Baseline EndFraction plus StartFraction p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis Over upper N Subscript b upper K Baseline EndFraction right-parenthesis Superscript negative 1 Baseline equals StartFraction upper N Subscript a upper K Baseline Over p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis plus upper R p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction

In PROC SEQDESIGN, the total sample sizes in the two groups are computed as

upper N Subscript a upper K Baseline equals left-parenthesis p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis plus upper R p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis right-parenthesis upper I Subscript upper X

upper N Subscript b upper K Baseline equals StartFraction 1 Over upper R EndFraction upper N Subscript a upper K

where is the constant allocation ratio, and and are proportions specified with the REF= option:

REF=NULLPROP uses proportions under : ,
REF=AVGNULLPROP uses the average proportion under :
REF=PROP uses proportions under : ,
REF=AVGPROP uses the average proportion under

The total sample size is given by

upper N Subscript upper X Baseline equals upper N Subscript a upper K Baseline plus upper N Subscript b upper K Baseline equals left-parenthesis upper R plus 1 right-parenthesis left-parenthesis StartFraction 1 Over upper R EndFraction p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis plus p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis right-parenthesis upper I Subscript upper X

For , the two sample sizes are equal,

You can specify the MODEL=TWOSAMPLEFREQ( TEST=PROP WEIGHT=R ) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC GENMOD with the default DIST=NORMAL option in the MODEL statement can be used to derive the two-sample Z test for proportion difference.

Test for Two Binomial Proportions with a Log Odds Ratio Statistic

The MODEL=TWOSAMPLEFREQ(TEST=LOGOR) option in the SAMPLESIZE statement derives the sample size required to test two binomial proportions by using a log odds ratio statistic. The odds ratio is the ratio of the odds in one group to the odds in the other group, and the log odds ratio is the logarithm of the odds ratio

theta equals normal l normal o normal g left-parenthesis StartFraction p Subscript a Baseline slash left-parenthesis 1 minus p Subscript a Baseline right-parenthesis Over p Subscript b Baseline slash left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction right-parenthesis equals normal l normal o normal g left-parenthesis StartFraction p Subscript a Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis Over p Subscript b Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis EndFraction right-parenthesis

The hypothesis of no difference between two proportions, , can be tested through the null hypothesis , where is the log odds ratio. For example, with and , it corresponds to the equivalent hypothesis and .

The maximum likelihood estimate of is given by

ModifyingAbove theta With caret equals normal l normal o normal g left-parenthesis StartFraction ModifyingAbove p With caret Subscript a Baseline left-parenthesis 1 minus ModifyingAbove p With caret Subscript b Baseline right-parenthesis Over ModifyingAbove p With caret Subscript b Baseline left-parenthesis 1 minus ModifyingAbove p With caret Subscript a Baseline right-parenthesis EndFraction right-parenthesis

with an asymptotic variance

normal upper V normal a normal r left-parenthesis ModifyingAbove theta With caret right-parenthesis equals upper I Superscript negative 1 Baseline equals StartFraction 1 Over upper N Subscript a Baseline p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis EndFraction plus StartFraction 1 Over upper N Subscript b Baseline p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction

where I is the information (Diggle et al. 2002, pp. 341–342). That is, the standardized statistic

In practice, and , the estimated sample proportions for groups A and B, respectively, at stage k, can be used to derive the information and the test statistic if the two sample sizes and are sufficiently large such that the test statistic has an approximately normal distribution.

In order to derive the sample sizes and uniquely from the information, is assumed for , where is the constant allocation ratio computed from the WEIGHT= option in the SAMPLESIZE statement. Then with

upper I Subscript upper X Baseline equals upper N Subscript b upper K Baseline left-parenthesis StartFraction 1 Over upper R p Subscript a Baseline left-parenthesis 1 minus p Subscript a Baseline right-parenthesis EndFraction plus StartFraction 1 Over p Subscript b Baseline left-parenthesis 1 minus p Subscript b Baseline right-parenthesis EndFraction right-parenthesis Superscript negative 1

the sample size can be computed.

In PROC SEQDESIGN, the total sample sizes in the two groups are computed as

upper N Subscript b upper K Baseline equals upper I Subscript upper X Baseline left-parenthesis StartFraction 1 Over upper R p Subscript a Superscript asterisk Baseline left-parenthesis 1 minus p Subscript a Superscript asterisk Baseline right-parenthesis EndFraction plus StartFraction 1 Over p Subscript b Superscript asterisk Baseline left-parenthesis 1 minus p Subscript b Superscript asterisk Baseline right-parenthesis EndFraction right-parenthesis

upper N Subscript a upper K Baseline equals upper R upper N Subscript b upper K

where is the constant allocation ratio, and and are proportions specified with the REF= option:

REF=NULLPROP uses proportions under : ,
REF=AVGNULLPROP uses the average proportion under :
REF=PROP uses proportions under : ,
REF=AVGPROP uses the average proportion under

You can specify the MODEL=TWOSAMPLEFREQ( TEST=LOGOR WEIGHT=R) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC LOGISTIC can be used to derive the log odds ratio statistic.

Test for Two Binomial Proportions with a Log Relative Risk Statistic

The MODEL=TWOSAMPLEFREQ(TEST=LOGRR) option in the SAMPLESIZE statement derives the sample size required to test two binomial proportions by using a log relative risk statistic. The relative risk is the ratio of the proportion in one group to the proportion in the other group. The log relative risk statistic is the logarithm of the relative risk

theta equals normal l normal o normal g left-parenthesis StartFraction p Subscript a Baseline Over p Subscript b Baseline EndFraction right-parenthesis

The hypothesis of no difference between two proportions, , can be tested through the null hypothesis . For example, with and , it corresponds to the equivalent hypothesis and .

The maximum likelihood estimate of is given by

with an asymptotic variance

upper I Superscript negative 1 Baseline equals StartFraction 1 minus p Subscript a Baseline Over upper N Subscript a Baseline p Subscript a Baseline EndFraction plus StartFraction 1 minus p Subscript b Baseline Over upper N Subscript b Baseline p Subscript b Baseline EndFraction

where I is the information (Chow and Liu 1998, p. 329).

In practice, and , the estimated sample proportions for groups A and B, respectively, at stage k, are used to derive the information and the test statistic .

The maximum information and proportions and are needed to derive the required sample size. If the maximum information is not specified or derived with the ALTREF= option in the procedure, the PROP= option in the SAMPLESIZE statement is used to provide proportions under the alternative hypothesis for the alternative reference and then to derive the maximum information.

Note that in order to derive the sample sizes and uniquely from the information, is assumed for , where is the constant allocation ratio computed from the WEIGHT= option in the SAMPLESIZE statement. Then the sample size can be computed from

upper I Subscript upper X Baseline equals upper N Subscript b upper K Baseline left-parenthesis StartFraction 1 minus p Subscript a Baseline Over upper R p Subscript a Baseline EndFraction plus StartFraction 1 minus p Subscript b Baseline Over p Subscript b Baseline EndFraction right-parenthesis Superscript negative 1

In PROC SEQDESIGN, the computed sample sizes in the two groups are

upper N Subscript b upper K Baseline equals upper I Subscript upper X Baseline left-parenthesis StartFraction 1 minus p Subscript a Superscript asterisk Baseline Over upper R p Subscript a Superscript asterisk Baseline EndFraction plus StartFraction 1 minus p Subscript b Superscript asterisk Baseline Over p Subscript b Superscript asterisk Baseline EndFraction right-parenthesis

where is the constant allocation ratio, and and are proportions specified with the REF= option:

REF=NULLPROP uses proportions under : ,
REF=AVGNULLPROP uses the average proportion under :
REF=PROP uses proportions under : ,
REF=AVGPROP uses the average proportion under

You can specify the MODEL=TWOSAMPLEFREQ( TEST=LOGRR WEIGHT=R) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A procedure such as PROC LOGISTIC can be used to derive the log relative risk statistic.

Test for Two Survival Distributions with a Log-Rank Test

The MODEL=TWOSAMPLESURV option in the SAMPLESIZE statement derives the number of events required for a log-rank test of two survival distributions. The analysis of survival data involves the survival times for both censored and uncensored data. A noncensored survival time is the time from treatment to an event such as remission or relapse for an individual. A censored survival time is the time from treatment to the time of analysis for an individual surviving at that time, and the status is unknown beyond that time.

PROC SEQDESIGN computes the number of events by using the approximations described in Schoenfeld (1981). It differs from PROC POWER, which uses an approach similar to that of Freedman (1982). Thus, even in the case of a one-stage design, the designs that PROC SEQDESIGN and PROC POWER produce do not match exactly. For a comparison of these approximation methods, see Hsieh (1992).

Let T be the random variable of the survival time. Then the survival function

upper S left-parenthesis t right-parenthesis equals probability left-parenthesis upper T greater-than t right-parenthesis

is the probability that an individual from the population has a survival time that exceeds t. And the hazard function is given by

h left-parenthesis t right-parenthesis equals StartFraction f left-parenthesis t right-parenthesis Over upper S left-parenthesis t right-parenthesis EndFraction

where is the density function of T.

The hazard functions can be used to test the equality of two survival distributions with the null hypothesis , where and are survival functions for groups A and B, respectively, and and are the corresponding hazard functions.

If the two hazards are proportional, , where is a constant, then an equivalent null hypothesis is

upper H 0 colon lamda equals StartFraction h Subscript a Baseline left-parenthesis t right-parenthesis Over h Subscript b Baseline left-parenthesis t right-parenthesis EndFraction equals 1

Alternatively, another equivalent null hypothesis is given by

upper H 0 colon theta equals minus normal l normal o normal g left-parenthesis lamda right-parenthesis equals 0

Suppose that the hazard rate h is a constant. Then with a specified median survival time , the hazard rate can be derived from the equation

e Superscript minus h upper T Super Subscript m Baseline equals one-half

Denote the distinct event times at stage k as , where is the total number of distinct event times. Then the score statistic is the log-rank statistic (Jennison and Turnbull 2000, pp. 259–261; Whitehead 1997, pp. 36–39)

upper S Subscript k Baseline equals sigma-summation Underscript j equals 1 Overscript t Subscript k Baseline Endscripts left-parenthesis d Subscript a k j Baseline minus e Subscript a k j Baseline right-parenthesis

where is the number of events from group A and is the number of expected events from A. The number of expected events from is computed as

e Subscript a k j Baseline equals d Subscript k j Baseline StartFraction r Subscript a k j Baseline Over r Subscript k j Baseline EndFraction

where is the number of events from both groups, is the number of individuals from the treatment group who survived up to time , and is the number of individuals from both groups who survived up to time .

If the number of events is small relative to , the number of individuals survived up to time , then with a sufficiently large sample size, has an approximately normal distribution

upper S Subscript k Baseline tilde upper N left-parenthesis theta upper I Subscript k Baseline comma upper I Subscript k Baseline right-parenthesis

where the variance of is the estimated information

upper I Subscript k Baseline equals sigma-summation Underscript j equals 1 Overscript t Subscript k Baseline Endscripts StartFraction r Subscript a k j Baseline r Subscript b k j Baseline d Subscript k j Baseline Over r Subscript k j Superscript 2 Baseline EndFraction

In order to derive the number of events from the information , is assumed for , where is the constant allocation ratio computed from the WEIGHT= option in the SAMPLESIZE statement.

The maximum information is needed to derive the required sample size. If the maximum information is specified or derived with the ALTREF= option in the procedure, the HAZARD=, MEDSURVTIME=, and HAZARDRATIO= options are not applicable. Otherwise, the HAZARD=, MEDSURVTIME=, or HAZARDRATIO= option is used to compute the alternative reference and then to derive the maximum information for the sample size calculation.

With , if the number of events is few relative to the number of individuals who survived, then , and

upper I Subscript upper X Baseline almost-equals sigma-summation Underscript j equals 1 Overscript t Subscript upper K Baseline Endscripts StartFraction upper R Over left-parenthesis upper R plus 1 right-parenthesis squared EndFraction d Subscript upper K j Baseline equals StartFraction upper R Over left-parenthesis upper R plus 1 right-parenthesis squared EndFraction upper D Subscript upper X

where is the total number of events.

Thus, the required total number of events

upper D Subscript upper X Baseline equals StartFraction left-parenthesis upper R plus 1 right-parenthesis squared Over upper R EndFraction upper I Subscript upper X

For a study group, if the hazard rate is constant, corresponding to an exponential survival distribution, and the individual accrual is uniform in the accrual time with a constant accrual rate , then the required total sample size and sample size at each stage can be derived. See the section Input Number of Events for Fixed-Sample Design for a detailed description of the sample size computation that uses hazard rates, accrual rate, and accrual time.

You can specify the MODEL=TWOSAMPLESURVIVAL option in the SAMPLESIZE statement to compute the required total number of events and individual number of events at each stage. With the specifications of hazard rates, accrual rate, and accrual time, the required total sample size and individual sample size at each stage can also be derived. If the REF=NULLHAZARD option is specified, the hazard rates under the null hypothesis, and , are used in the sample size computation. Otherwise, the hazard rates under the alternative hypothesis, and , are used. A procedure such as PROC LIFETEST can be used to derive the log-rank statistic.

Last updated: December 09, 2022