The SEQDESIGN Procedure

Applicable Regression Parameter Tests and Sample Size Computation

The SEQDESIGN procedure provides sample size computation for tests of a regression parameter in three regression models: normal regression, logistic regression, and proportional hazards regression.

To test a parameter beta 1 in a regression model, the variance of the parameter estimate ModifyingAbove beta With caret Subscript 1 is needed for the sample size computation. In a simple regression model with one covariate X1, the variance of ModifyingAbove beta With caret Subscript 1 is inversely related to the variance of X1, sigma Subscript x Superscript 2. That is,

normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis proportional-to StartFraction 1 Over upper N sigma Subscript x Superscript 2 Baseline EndFraction

for the normal regression and logistic regression models, where N is the sample size, and

normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis proportional-to StartFraction 1 Over upper D sigma Subscript x Superscript 2 Baseline EndFraction

for the proportional hazards regression model, where D is the number of events.

For a regression model with more than one covariate, the variance of ModifyingAbove beta With caret Subscript 1 for the normal regression and logistic regression models is inversely related to the variance of X1 after adjusting for other covariates. That is,

normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis proportional-to StartFraction 1 Over upper N left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis sigma Subscript x Superscript 2 Baseline EndFraction

where ModifyingAbove beta With caret Subscript 1 is the estimate of the parameter beta 1 in the model and r Subscript x Superscript 2 is the R square from the regression of sans-serif upper X Baseline sans-serif 1 on other covariates—that is, the proportion of the variance sigma Subscript x Superscript 2 explained by these covariates.

Similarly, for a proportional hazards regression model,

normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis proportional-to StartFraction 1 Over upper D left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis sigma Subscript x Superscript 2 Baseline EndFraction

Thus, with the derived maximum information, the required sample size or number of events can also be computed for the testing of a parameter in a regression model with covariates.

Test for a Parameter in the Regression Model

The MODEL=REG option in the SAMPLESIZE statement derives the sample size required for a Z test of a normal regression. For a normal linear regression model, the response variable is normally distributed with the mean equal to a linear function of the explanatory variables and the constant variance sigma squared.

The normal linear model is

bold y tilde upper N left-parenthesis bold upper X bold-italic beta comma sigma Subscript y Superscript 2 Baseline bold upper I Subscript left-parenthesis upper N right-parenthesis Baseline right-parenthesis

where bold upper Y Subscript left-parenthesis upper N times 1 right-parenthesis is the vector of the N observed responses, bold upper X Subscript left-parenthesis upper N times p right-parenthesis is the design matrix for these N observations, bold-italic beta Subscript left-parenthesis p times 1 right-parenthesis is the parameter vector, and bold upper I Subscript left-parenthesis upper N right-parenthesis is the left-parenthesis upper N times upper N right-parenthesis identity matrix.

The least squares estimate is

ModifyingAbove bold-italic beta With caret equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime bold upper Y

and is normally distributed with mean bold-italic beta and variance

normal upper V normal a normal r left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals sigma Subscript y Superscript 2 Baseline left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1

For a model with only one covariate X1,

ModifyingAbove beta With caret Subscript 1 Baseline tilde upper N left-parenthesis beta 1 comma normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis right-parenthesis

where the variance

normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis equals upper I Subscript beta 1 Superscript negative 1 Baseline equals sigma Subscript y Superscript 2 Baseline StartFraction 1 Over upper N sigma Subscript x Superscript 2 Baseline EndFraction

Thus, with the derived maximum information upper I Subscript upper X Baseline equals upper I Subscript beta 1, the required sample size is given by

upper N equals upper I Subscript upper X Baseline StartFraction sigma Subscript y Superscript 2 Baseline Over sigma Subscript x Superscript 2 Baseline EndFraction

For a normal linear model with more than one covariate, the variance of a single parameter beta 1 is

normal upper V normal a normal r left-parenthesis ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis equals sigma Subscript y Superscript 2 Baseline left-parenthesis bold upper X prime bold upper X right-parenthesis Subscript left-parenthesis 11 right-parenthesis Superscript negative 1 Baseline equals sigma Subscript y Superscript 2 Baseline StartFraction 1 Over upper N sigma Subscript x Superscript 2 Baseline left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis EndFraction

where left-parenthesis bold upper X prime bold upper X right-parenthesis Subscript left-parenthesis 11 right-parenthesis Superscript negative 1 is the diagonal element of the left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 matrix corresponding to the parameter beta 1, sigma Subscript x Superscript 2 is the variance of the variable X1, and r Subscript x Superscript 2 is the proportion of variance of X1 explained by other covariates. The value sigma Subscript x Superscript 2 Baseline left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis represents the variance of X1 after adjusting for all other covariates.

Thus, with the derived maximum information upper I Subscript upper X, the required sample size is

upper N equals upper I Subscript upper X Baseline StartFraction sigma Subscript y Superscript 2 Baseline Over left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis sigma Subscript x Superscript 2 Baseline EndFraction

In the SEQDESIGN procedure, you can specify the MODEL=REG( VARIANCE=sigma Subscript y Superscript 2 XVARIANCE=sigma Subscript x Superscript 2 XRSQUARE=r Subscript x Superscript 2) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage. A SAS procedure such as PROC REG can be used to compute the parameter estimate and its standard error at each stage.

Test for a Parameter in the Logistic Regression Model

The MODEL=LOGISTIC option in the SAMPLESIZE statement derives the sample size required for a Z test of a logistic regression parameter. The linear logistic model has the form

normal l normal o normal g normal i normal t left-parenthesis p right-parenthesis equals normal l normal o normal g left-parenthesis StartFraction p Over 1 minus p EndFraction right-parenthesis equals bold x bold-italic beta

where p is the response probability to be modeled and bold-italic beta is a vector of parameters.

Following the derivation in the section Test for a Parameter in the Regression Model, the required sample size for testing a parameter in bold-italic beta is given by

upper N equals upper I Subscript upper X Baseline StartFraction sigma Subscript y Superscript 2 Baseline Over left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis sigma Subscript x Superscript 2 Baseline EndFraction

With the variance of the logit response, sigma Subscript y Superscript 2 Baseline equals 1 slash left-parenthesis p left-parenthesis 1 minus p right-parenthesis right-parenthesis,

upper N equals upper I Subscript upper X Baseline StartFraction 1 Over p left-parenthesis 1 minus p right-parenthesis EndFraction StartFraction 1 Over left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis sigma Subscript x Superscript 2 Baseline EndFraction

where sigma Subscript x Superscript 2 is the variance of X and r Subscript x Superscript 2 is the proportion of variance explained by other covariates.

In the SEQDESIGN procedure, you can specify the MODEL=LOGISTIC( PROP=p XVARIANCE=sigma Subscript x Superscript 2 XRSQUARE=r Subscript x Superscript 2) option in the SAMPLESIZE statement to compute the required total sample size and individual sample size at each stage.

A SAS procedure such as PROC LOGISTIC can be used to compute the parameter estimate and its standard error at each stage.

Test for a Parameter in the Proportional Hazards Regression Model

The MODEL=PHREG option in the SAMPLESIZE statement derives the number of events required for a Z test of a proportional hazards regression parameter. For analyses of survival data, Cox’s semiparametric model is often used to examine the effect of explanatory variables on hazard rates. The survival time of each observation in the population is assumed to follow its own hazard function, h Subscript i Baseline left-parenthesis t right-parenthesis, expressed as

h Subscript i Baseline left-parenthesis t right-parenthesis equals h left-parenthesis t semicolon bold upper X Subscript i Baseline right-parenthesis equals h 0 left-parenthesis t right-parenthesis normal e normal x normal p left-parenthesis bold upper X prime Subscript i Baseline bold-italic beta right-parenthesis

where h 0 left-parenthesis t right-parenthesis is an arbitrary and unspecified baseline hazard function, bold x Subscript i is the vector of explanatory variables for the ith individual, and bold-italic beta is the vector of regression parameters associated with the explanatory variables.

Hsieh and Lavori (2000, p. 553) show that the required number of events for testing a parameter in bold-italic beta, beta 1, associated with the variable X1 is given by

upper D Subscript upper X Baseline equals upper I Subscript upper X Baseline StartFraction 1 Over left-parenthesis 1 minus r Subscript x Superscript 2 Baseline right-parenthesis sigma Subscript x Superscript 2 Baseline EndFraction

where sigma Subscript x Superscript 2 is the variance of X1 and r Subscript x Superscript 2 is the proportion of variance of X1 explained by other covariates.

In the SEQDESIGN procedure, you can specify the MODEL=PHREG( XVARIANCE=sigma Subscript x Superscript 2 XRSQUARE=r Subscript x Superscript 2) option in the SAMPLESIZE statement to compute the required number of events and individual number of events at each stage.

A SAS procedure such as PROC PHREG can be used to compute the parameter estimate and its standard error at each stage.

Note that for a two-sample test, X1 is an indicator variable and is the only covariate in the model. Thus, if the two sample sizes are equal, then the variance sigma Subscript x Superscript 2 Baseline equals 1 slash 4 and the required number of events for testing the parameter beta 1 is given by

upper D Subscript upper X Baseline equals upper I Subscript upper X Baseline StartFraction 1 Over sigma Subscript x Superscript 2 Baseline EndFraction equals 4 upper I Subscript upper X

See the section Input Number of Events for Fixed-Sample Design for a detailed description of the sample size computation that uses hazard rates, accrual rate, and accrual time.

Last updated: December 09, 2022