The POWER Procedure

Example 96.11 Logistic Regression Using the CUSTOM Statement

(View the complete code for this example.)

The framework in Lyles, Lin, and Williamson (2007) provides an effective strategy for using the CUSTOM statement to compute power or sample size for generalized linear models. The process involves the following steps:

Specify the SAS code that you would use to perform the data analysis.
Create an exemplary data set that resembles the nature of the data you expect to obtain, covering as many reasonable values of the response variable as possible.
Run the analysis on the exemplary data set and extract the appropriate test statistic.
If you are planning for a likelihood ratio test, run the analysis again, but this time with a reduced model (lacking the predictors being tested). Again extract the appropriate test statistic.
Run PROC POWER with the CUSTOM statement, specifying the noncentral chi-square distribution with primary noncentrality equal to the appropriate transformation of the test statistics.

Although the power computation is conditional (assuming fixed predictors), you can approximate unconditional power by using a WEIGHT statement in the data analysis and constructing the exemplary data to be representative of the conjectured distribution of the predictors.

Suppose you want to compute the power of the Wald chi-square and likelihood ratio tests for the interaction term in a logistic regression model that has two binary covariates, as in Lyles, Lin, and Williamson (2007, section 3.2). You plan to use a sample size and significance level = 0.05. You have conjectured values for the regression coefficients as shown in Table 42.

Table 42: Conjectured Regression Coefficient Values

Source	Coefficient	Value
Intercept		0
`X1`		–1.5
`X2`		0.5
`X1*X2`		2

You expect the covariates to be distributed according to a multinomial distribution with probabilities as shown in Table 43.

Table 43: Conjectured Covariate Distribution

`X1`	`X2`	Probability
0	0	0.2
0	1	0.25
1	0	0.25
1	1	0.3

The strategy for the power computation is summarized as follows:

Create an exemplary data set to represent both the regression coefficients and the covariate distribution.
Use the LOGISTIC procedure to compute the primary noncentrality values and degrees of freedom you need for the power calculation.
Fetch the primary noncentrality values and degrees of freedom from PROC LOGISTIC output and pass them into the CUSTOM statement in PROC POWER along with the other power analysis parameters.

The following steps describe this strategy in detail:

First create a data set that has one row for each unique design profile and compute the response probability that is associated with each profile according to your conjectured regression coefficients values. The following DATA step computes these probabilities as values of a variable PY:

* Compute P(Yj|Xi) values from conjectured betas;
data Exemplary;
   _B0 = 0; _B1 = -1.5; _B2 = 0.5; _B3 = 2;
   do X1 = 0 to 1;
      do X2 = 0 to 1;
         _pi = logistic(_B0 + _B1*X1 + _B2*X2 + _B3*X1*X2);
         do Y = 0 to 1;
            PY = _pi**Y * (1-_pi)**(1-Y);
            output;
         end;
      end;
   end;
   keep X1 X2 Y PY;
run;

proc print data=Exemplary;
run;

The output data set is shown in Output 96.11.1.

Output 96.11.1: Response Probabilities Produced from Regression Coefficients

Obs	X1	X2	Y	PY
1	0	0	0	0.50000
2	0	0	1	0.50000
3	0	1	0	0.37754
4	0	1	1	0.62246
5	1	0	0	0.81757
6	1	0	1	0.18243
7	1	1	0	0.26894
8	1	1	1	0.73106

Next expand this data set into an exemplary data set that represents your assumed covariate distribution. The following DATA step replicates each design profile the appropriate number of times according to the multinomial probabilities in Table 43.

* Expand according to relative allocations of design profiles;
data Exemplary;
   set Exemplary;
   if      (X1 = 0 and X2 = 0) then _alloc=4;
   else if (X1 = 0 and X2 = 1) then _alloc=5;
   else if (X1 = 1 and X2 = 0) then _alloc=5;
   else                             _alloc=6;
   do _i = 1 to _alloc;
      output;
   end;
   keep X1 X2 Y PY;
run;

proc print data=Exemplary;
run;

Output 96.11.2 shows the resulting exemplary data set.

Output 96.11.2: Exemplary Data Set

Obs	X1	X2	Y	PY
1	0	0	0	0.50000
2	0	0	0	0.50000
3	0	0	0	0.50000
4	0	0	0	0.50000
5	0	0	1	0.50000
6	0	0	1	0.50000
7	0	0	1	0.50000
8	0	0	1	0.50000
9	0	1	0	0.37754
10	0	1	0	0.37754
11	0	1	0	0.37754
12	0	1	0	0.37754
13	0	1	0	0.37754
14	0	1	1	0.62246
15	0	1	1	0.62246
16	0	1	1	0.62246
17	0	1	1	0.62246
18	0	1	1	0.62246
19	1	0	0	0.81757
20	1	0	0	0.81757
21	1	0	0	0.81757
22	1	0	0	0.81757
23	1	0	0	0.81757
24	1	0	1	0.18243
25	1	0	1	0.18243
26	1	0	1	0.18243
27	1	0	1	0.18243
28	1	0	1	0.18243
29	1	1	0	0.26894
30	1	1	0	0.26894
31	1	1	0	0.26894
32	1	1	0	0.26894
33	1	1	0	0.26894
34	1	1	0	0.26894
35	1	1	1	0.73106
36	1	1	1	0.73106
37	1	1	1	0.73106
38	1	1	1	0.73106
39	1	1	1	0.73106
40	1	1	1	0.73106

Next use the exemplary data as the input data set in PROC LOGISTIC, assigning PY as the WEIGHT variable. First fit the full model and save both the parameter estimates and fit statistics output tables as data sets. (You will later use a parameter estimate for the Wald chi-square power analysis and a fit statistic for the likelihood ratio test.)

* Run full model;
proc logistic data=Exemplary;
   ods output parameterestimates=fitFull fitstatistics=statsFull;
   weight PY;
   model Y(event='1') = X1 X2 X1*X2;
run;

Output 96.11.3 and Output 96.11.4 show these tables from the PROC LOGISTIC output.

Output 96.11.3: Parameter Estimates for Full Model

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates
Parameter	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept	1	1.69E-16	1.0000	0.0000	1.0000
X1	1	-1.4999	1.5300	0.9611	0.3269
X2	1	0.5000	1.3605	0.1351	0.7132
X1*X2	1	1.9999	2.0099	0.9901	0.3197

Output 96.11.4: Fit Statistics for Full Model

Model Fit Statistics
Criterion	Intercept Only	Intercept and Covariates
AIC	29.692	31.911
SC	31.381	38.666
-2 Log L	27.692	23.911

The numbers you need for the power analysis for the Wald chi-square test of X1*X2 are the degrees of freedom (1) and the Wald chi-square test statistic (0.9901). The number you need for the likelihood ratio test is the –2LogL value (23.911). The following DATA step saves these values to data sets:

* Fetch Wald chi-square statistic for test of X1*X2;
data fitFull;
   set fitFull;
   where Variable = "X1*X2";
   keep DF WaldChiSq;
run;

* Fetch -2LogL for full model;
data statsFull;
   set statsFull;
   where Criterion = "-2 Log L";
   Neg2LogLFull = InterceptAndCovariates;
   keep Neg2LogLFull;
run;

Next run the reduced model (dropping the interaction term, the one you want to test) and save the fit statistics table.

* Run reduced model;
proc logistic data=Exemplary;
   ods output fitstatistics=statsRed;
   weight PY;
   model Y = X1 X2;
run;

Output 96.11.5 shows the fit statistics table from the PROC LOGISTIC output.

Output 96.11.5: Fit Statistics for Reduced Model

The LOGISTIC Procedure

Model Fit Statistics
Criterion	Intercept Only	Intercept and Covariates
AIC	29.692	30.939
SC	31.381	36.005
-2 Log L	27.692	24.939

The number you need for the power analysis for the likelihood ratio test is the –2LogL value (24.939). The following DATA step saves these values to data sets:

* Fetch -2LogL for reduced model;
data statsRed;
   set statsRed;
   where Criterion = "-2 Log L";
   Neg2LogLRed = InterceptAndCovariates;
   keep Neg2LogLRed;
run;

Now you are ready to compute the primary noncentrality values. You need to divide both the Wald chi-square and –2LogL difference by the "effective sample size" of the exemplary data set—that is, the number of rows for each response value (20). The following statements compute the primary noncentrality and store all three numbers that you need for PROC POWER in macro variables:

* Compute primary noncentralities for Wald and LR tests;
data PrimNC;
   merge fitFull statsFull statsRed;
   WaldPrimNC = WaldChiSq / 20;
   LRPrimNC = -(Neg2LogLFull-Neg2LogLRed) / 20;
   keep DF WaldPrimNC LRPrimNC;
   call symput ("DF", DF);
   call symput ("WaldPrimNC", WaldPrimNC);
   call symput ("LRPrimNC", LRPrimNC);
run;

proc print data=PrimNC;
run;

The primary noncentrality values thus computed are shown in Output 96.11.6.

Output 96.11.6: Primary Noncentrality Values Derived from PROC LOGISTIC

Obs	DF	WaldPrimNC	LRPrimNC
1	1	0.049506	0.051396

Perform the power analysis by using the CUSTOM statement in PROC POWER as follows:

proc power;
   custom
      dist   = chisquare
      primnc = &WaldPrimNC &LRPrimNC
      testdf = &DF
      ntotal = 100
      power  = .;
run;

The computed powers for the two tests, shown in Output 96.11.7, match the results in Lyles, Lin, and Williamson (2007, section 3.2).

Output 96.11.7: Computed Power Values

The POWER Procedure

Custom Test

Fixed Scenario Elements
Distribution	Chi-square
Method	Default
Test Degrees of Freedom	1
Total Sample Size	100
Alpha	0.05
Primary Noncentrality Multiplier	1
Critical Value Multiplier	1

Computed Power
Index	Primary NC	Power
1	0.0495	0.605
2	0.0514	0.621

The following statements validate the PROC POWER results by using the QUANTILE and SDF functions in the DATA step:

data Power;
   set PrimNC;
   Crit = quantile("chisq", 1-0.05, DF);
   WaldPower = sdf("chisq", Crit, DF, 100 * WaldPrimNC);
   LRPower = sdf("chisq", Crit, DF, 100 * LRPrimNC);
   keep WaldPower LRPower;
run;

proc print data=Power;
run;

The computed powers for the two tests, shown in Output 96.11.8, match the results in Lyles, Lin, and Williamson (2007, section 3.2).

Output 96.11.8: Validation of Power Values

Obs	WaldPower	LRPower
1	0.60452	0.62063

Last updated: December 09, 2022