(View the complete code for this example.)
The framework in Lyles, Lin, and Williamson (2007) provides an effective strategy for using the CUSTOM statement to compute power or sample size for generalized linear models. The process involves the following steps:
Specify the SAS code that you would use to perform the data analysis.
Create an exemplary data set that resembles the nature of the data you expect to obtain, covering as many reasonable values of the response variable as possible.
Run the analysis on the exemplary data set and extract the appropriate test statistic.
If you are planning for a likelihood ratio test, run the analysis again, but this time with a reduced model (lacking the predictors being tested). Again extract the appropriate test statistic.
Run PROC POWER with the CUSTOM statement, specifying the noncentral chi-square distribution with primary noncentrality equal to the appropriate transformation of the test statistics.
Although the power computation is conditional (assuming fixed predictors), you can approximate unconditional power by using a WEIGHT statement in the data analysis and constructing the exemplary data to be representative of the conjectured distribution of the predictors.
Suppose you want to compute the power of the Wald chi-square and likelihood ratio tests for the interaction term in a logistic regression model that has two binary covariates, as in Lyles, Lin, and Williamson (2007, section 3.2). You plan to use a sample size and significance level
= 0.05. You have conjectured values for the
regression coefficients as shown in Table 42.
Table 42: Conjectured Regression Coefficient Values
You expect the covariates to be distributed according to a multinomial distribution with probabilities as shown in Table 43.
Table 43: Conjectured Covariate Distribution
X1 |
X2 |
Probability |
| 0 | 0 | 0.2 |
| 0 | 1 | 0.25 |
| 1 | 0 | 0.25 |
| 1 | 1 | 0.3 |
The strategy for the power computation is summarized as follows:
Create an exemplary data set to represent both the regression coefficients and the covariate distribution.
Use the LOGISTIC procedure to compute the primary noncentrality values and degrees of freedom you need for the power calculation.
Fetch the primary noncentrality values and degrees of freedom from PROC LOGISTIC output and pass them into the CUSTOM statement in PROC POWER along with the other power analysis parameters.
The following steps describe this strategy in detail:
First create a data set that has one row for each unique design profile and compute the response probability that is associated with each profile according to your conjectured regression coefficients values. The following DATA step computes these probabilities as values of a variable PY:
* Compute P(Yj|Xi) values from conjectured betas;
data Exemplary;
_B0 = 0; _B1 = -1.5; _B2 = 0.5; _B3 = 2;
do X1 = 0 to 1;
do X2 = 0 to 1;
_pi = logistic(_B0 + _B1*X1 + _B2*X2 + _B3*X1*X2);
do Y = 0 to 1;
PY = _pi**Y * (1-_pi)**(1-Y);
output;
end;
end;
end;
keep X1 X2 Y PY;
run;
proc print data=Exemplary;
run;
The output data set is shown in Output 96.11.1.
Output 96.11.1: Response Probabilities Produced from Regression Coefficients
| Obs | X1 | X2 | Y | PY |
|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0.50000 |
| 2 | 0 | 0 | 1 | 0.50000 |
| 3 | 0 | 1 | 0 | 0.37754 |
| 4 | 0 | 1 | 1 | 0.62246 |
| 5 | 1 | 0 | 0 | 0.81757 |
| 6 | 1 | 0 | 1 | 0.18243 |
| 7 | 1 | 1 | 0 | 0.26894 |
| 8 | 1 | 1 | 1 | 0.73106 |
Next expand this data set into an exemplary data set that represents your assumed covariate distribution. The following DATA step replicates each design profile the appropriate number of times according to the multinomial probabilities in Table 43.
* Expand according to relative allocations of design profiles;
data Exemplary;
set Exemplary;
if (X1 = 0 and X2 = 0) then _alloc=4;
else if (X1 = 0 and X2 = 1) then _alloc=5;
else if (X1 = 1 and X2 = 0) then _alloc=5;
else _alloc=6;
do _i = 1 to _alloc;
output;
end;
keep X1 X2 Y PY;
run;
proc print data=Exemplary;
run;
Output 96.11.2 shows the resulting exemplary data set.
Output 96.11.2: Exemplary Data Set
| Obs | X1 | X2 | Y | PY |
|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0.50000 |
| 2 | 0 | 0 | 0 | 0.50000 |
| 3 | 0 | 0 | 0 | 0.50000 |
| 4 | 0 | 0 | 0 | 0.50000 |
| 5 | 0 | 0 | 1 | 0.50000 |
| 6 | 0 | 0 | 1 | 0.50000 |
| 7 | 0 | 0 | 1 | 0.50000 |
| 8 | 0 | 0 | 1 | 0.50000 |
| 9 | 0 | 1 | 0 | 0.37754 |
| 10 | 0 | 1 | 0 | 0.37754 |
| 11 | 0 | 1 | 0 | 0.37754 |
| 12 | 0 | 1 | 0 | 0.37754 |
| 13 | 0 | 1 | 0 | 0.37754 |
| 14 | 0 | 1 | 1 | 0.62246 |
| 15 | 0 | 1 | 1 | 0.62246 |
| 16 | 0 | 1 | 1 | 0.62246 |
| 17 | 0 | 1 | 1 | 0.62246 |
| 18 | 0 | 1 | 1 | 0.62246 |
| 19 | 1 | 0 | 0 | 0.81757 |
| 20 | 1 | 0 | 0 | 0.81757 |
| 21 | 1 | 0 | 0 | 0.81757 |
| 22 | 1 | 0 | 0 | 0.81757 |
| 23 | 1 | 0 | 0 | 0.81757 |
| 24 | 1 | 0 | 1 | 0.18243 |
| 25 | 1 | 0 | 1 | 0.18243 |
| 26 | 1 | 0 | 1 | 0.18243 |
| 27 | 1 | 0 | 1 | 0.18243 |
| 28 | 1 | 0 | 1 | 0.18243 |
| 29 | 1 | 1 | 0 | 0.26894 |
| 30 | 1 | 1 | 0 | 0.26894 |
| 31 | 1 | 1 | 0 | 0.26894 |
| 32 | 1 | 1 | 0 | 0.26894 |
| 33 | 1 | 1 | 0 | 0.26894 |
| 34 | 1 | 1 | 0 | 0.26894 |
| 35 | 1 | 1 | 1 | 0.73106 |
| 36 | 1 | 1 | 1 | 0.73106 |
| 37 | 1 | 1 | 1 | 0.73106 |
| 38 | 1 | 1 | 1 | 0.73106 |
| 39 | 1 | 1 | 1 | 0.73106 |
| 40 | 1 | 1 | 1 | 0.73106 |
Next use the exemplary data as the input data set in PROC LOGISTIC, assigning PY as the WEIGHT variable. First fit the full model and save both the parameter estimates and fit statistics output tables as data sets. (You will later use a parameter estimate for the Wald chi-square power analysis and a fit statistic for the likelihood ratio test.)
* Run full model;
proc logistic data=Exemplary;
ods output parameterestimates=fitFull fitstatistics=statsFull;
weight PY;
model Y(event='1') = X1 X2 X1*X2;
run;
Output 96.11.3 and Output 96.11.4 show these tables from the PROC LOGISTIC output.
Output 96.11.3: Parameter Estimates for Full Model
| Analysis of Maximum Likelihood Estimates | |||||
|---|---|---|---|---|---|
| Parameter | DF | Estimate | Standard Error |
Wald Chi-Square |
Pr > ChiSq |
| Intercept | 1 | 1.69E-16 | 1.0000 | 0.0000 | 1.0000 |
| X1 | 1 | -1.4999 | 1.5300 | 0.9611 | 0.3269 |
| X2 | 1 | 0.5000 | 1.3605 | 0.1351 | 0.7132 |
| X1*X2 | 1 | 1.9999 | 2.0099 | 0.9901 | 0.3197 |
Output 96.11.4: Fit Statistics for Full Model
| Model Fit Statistics | ||
|---|---|---|
| Criterion | Intercept Only | Intercept and Covariates |
| AIC | 29.692 | 31.911 |
| SC | 31.381 | 38.666 |
| -2 Log L | 27.692 | 23.911 |
The numbers you need for the power analysis for the Wald chi-square test of X1*X2 are the degrees of freedom (1) and the Wald chi-square test statistic (0.9901). The number you need for the likelihood ratio test is the –2LogL value (23.911). The following DATA step saves these values to data sets:
* Fetch Wald chi-square statistic for test of X1*X2;
data fitFull;
set fitFull;
where Variable = "X1*X2";
keep DF WaldChiSq;
run;
* Fetch -2LogL for full model;
data statsFull;
set statsFull;
where Criterion = "-2 Log L";
Neg2LogLFull = InterceptAndCovariates;
keep Neg2LogLFull;
run;
Next run the reduced model (dropping the interaction term, the one you want to test) and save the fit statistics table.
* Run reduced model;
proc logistic data=Exemplary;
ods output fitstatistics=statsRed;
weight PY;
model Y = X1 X2;
run;
Output 96.11.5 shows the fit statistics table from the PROC LOGISTIC output.
Output 96.11.5: Fit Statistics for Reduced Model
| Model Fit Statistics | ||
|---|---|---|
| Criterion | Intercept Only | Intercept and Covariates |
| AIC | 29.692 | 30.939 |
| SC | 31.381 | 36.005 |
| -2 Log L | 27.692 | 24.939 |
The number you need for the power analysis for the likelihood ratio test is the –2LogL value (24.939). The following DATA step saves these values to data sets:
* Fetch -2LogL for reduced model;
data statsRed;
set statsRed;
where Criterion = "-2 Log L";
Neg2LogLRed = InterceptAndCovariates;
keep Neg2LogLRed;
run;
Now you are ready to compute the primary noncentrality values. You need to divide both the Wald chi-square and –2LogL difference by the "effective sample size" of the exemplary data set—that is, the number of rows for each response value (20). The following statements compute the primary noncentrality and store all three numbers that you need for PROC POWER in macro variables:
* Compute primary noncentralities for Wald and LR tests;
data PrimNC;
merge fitFull statsFull statsRed;
WaldPrimNC = WaldChiSq / 20;
LRPrimNC = -(Neg2LogLFull-Neg2LogLRed) / 20;
keep DF WaldPrimNC LRPrimNC;
call symput ("DF", DF);
call symput ("WaldPrimNC", WaldPrimNC);
call symput ("LRPrimNC", LRPrimNC);
run;
proc print data=PrimNC;
run;
The primary noncentrality values thus computed are shown in Output 96.11.6.
Output 96.11.6: Primary Noncentrality Values Derived from PROC LOGISTIC
| Obs | DF | WaldPrimNC | LRPrimNC |
|---|---|---|---|
| 1 | 1 | 0.049506 | 0.051396 |
Perform the power analysis by using the CUSTOM statement in PROC POWER as follows:
proc power;
custom
dist = chisquare
primnc = &WaldPrimNC &LRPrimNC
testdf = &DF
ntotal = 100
power = .;
run;
The computed powers for the two tests, shown in Output 96.11.7, match the results in Lyles, Lin, and Williamson (2007, section 3.2).
Output 96.11.7: Computed Power Values
| Fixed Scenario Elements | |
|---|---|
| Distribution | Chi-square |
| Method | Default |
| Test Degrees of Freedom | 1 |
| Total Sample Size | 100 |
| Alpha | 0.05 |
| Primary Noncentrality Multiplier | 1 |
| Critical Value Multiplier | 1 |
| Computed Power | ||
|---|---|---|
| Index | Primary NC | Power |
| 1 | 0.0495 | 0.605 |
| 2 | 0.0514 | 0.621 |
The following statements validate the PROC POWER results by using the QUANTILE and SDF functions in the DATA step:
data Power;
set PrimNC;
Crit = quantile("chisq", 1-0.05, DF);
WaldPower = sdf("chisq", Crit, DF, 100 * WaldPrimNC);
LRPower = sdf("chisq", Crit, DF, 100 * LRPrimNC);
keep WaldPower LRPower;
run;
proc print data=Power;
run;
The computed powers for the two tests, shown in Output 96.11.8, match the results in Lyles, Lin, and Williamson (2007, section 3.2).
Output 96.11.8: Validation of Power Values
| Obs | WaldPower | LRPower |
|---|---|---|
| 1 | 0.60452 | 0.62063 |