The POWER Procedure

Example 96.11 Logistic Regression Using the CUSTOM Statement

(View the complete code for this example.)

The framework in Lyles, Lin, and Williamson (2007) provides an effective strategy for using the CUSTOM statement to compute power or sample size for generalized linear models. The process involves the following steps:

  1. Specify the SAS code that you would use to perform the data analysis.

  2. Create an exemplary data set that resembles the nature of the data you expect to obtain, covering as many reasonable values of the response variable as possible.

  3. Run the analysis on the exemplary data set and extract the appropriate test statistic.

  4. If you are planning for a likelihood ratio test, run the analysis again, but this time with a reduced model (lacking the predictors being tested). Again extract the appropriate test statistic.

  5. Run PROC POWER with the CUSTOM statement, specifying the noncentral chi-square distribution with primary noncentrality equal to the appropriate transformation of the test statistics.

Although the power computation is conditional (assuming fixed predictors), you can approximate unconditional power by using a WEIGHT statement in the data analysis and constructing the exemplary data to be representative of the conjectured distribution of the predictors.

Suppose you want to compute the power of the Wald chi-square and likelihood ratio tests for the interaction term in a logistic regression model that has two binary covariates, as in Lyles, Lin, and Williamson (2007, section 3.2). You plan to use a sample size upper N equals 100 and significance level alpha = 0.05. You have conjectured values for the beta Subscript i regression coefficients as shown in Table 42.

Table 42: Conjectured Regression Coefficient Values

Source Coefficient Value
Intercept beta 0 0
X1 beta 1 –1.5
X2 beta 2 0.5
X1*X2 beta 3 2


You expect the covariates to be distributed according to a multinomial distribution with probabilities as shown in Table 43.

Table 43: Conjectured Covariate Distribution

X1 X2 Probability
0 0 0.2
0 1 0.25
1 0 0.25
1 1 0.3


The strategy for the power computation is summarized as follows:

  1. Create an exemplary data set to represent both the beta regression coefficients and the covariate distribution.

  2. Use the LOGISTIC procedure to compute the primary noncentrality values and degrees of freedom you need for the power calculation.

  3. Fetch the primary noncentrality values and degrees of freedom from PROC LOGISTIC output and pass them into the CUSTOM statement in PROC POWER along with the other power analysis parameters.

The following steps describe this strategy in detail:

  1. First create a data set that has one row for each unique design profile and compute the response probability that is associated with each profile according to your conjectured regression coefficients values. The following DATA step computes these probabilities as values of a variable PY:

    * Compute P(Yj|Xi) values from conjectured betas;
    data Exemplary;
       _B0 = 0; _B1 = -1.5; _B2 = 0.5; _B3 = 2;
       do X1 = 0 to 1;
          do X2 = 0 to 1;
             _pi = logistic(_B0 + _B1*X1 + _B2*X2 + _B3*X1*X2);
             do Y = 0 to 1;
                PY = _pi**Y * (1-_pi)**(1-Y);
                output;
             end;
          end;
       end;
       keep X1 X2 Y PY;
    run;
    
    proc print data=Exemplary;
    run;
    

    The output data set is shown in Output 96.11.1.

    Output 96.11.1: Response Probabilities Produced from Regression Coefficients

    Obs X1 X2 Y PY
    1 0 0 0 0.50000
    2 0 0 1 0.50000
    3 0 1 0 0.37754
    4 0 1 1 0.62246
    5 1 0 0 0.81757
    6 1 0 1 0.18243
    7 1 1 0 0.26894
    8 1 1 1 0.73106


    Next expand this data set into an exemplary data set that represents your assumed covariate distribution. The following DATA step replicates each design profile the appropriate number of times according to the multinomial probabilities in Table 43.

    * Expand according to relative allocations of design profiles;
    data Exemplary;
       set Exemplary;
       if      (X1 = 0 and X2 = 0) then _alloc=4;
       else if (X1 = 0 and X2 = 1) then _alloc=5;
       else if (X1 = 1 and X2 = 0) then _alloc=5;
       else                             _alloc=6;
       do _i = 1 to _alloc;
          output;
       end;
       keep X1 X2 Y PY;
    run;
    
    proc print data=Exemplary;
    run;
    

    Output 96.11.2 shows the resulting exemplary data set.

    Output 96.11.2: Exemplary Data Set

    Obs X1 X2 Y PY
    1 0 0 0 0.50000
    2 0 0 0 0.50000
    3 0 0 0 0.50000
    4 0 0 0 0.50000
    5 0 0 1 0.50000
    6 0 0 1 0.50000
    7 0 0 1 0.50000
    8 0 0 1 0.50000
    9 0 1 0 0.37754
    10 0 1 0 0.37754
    11 0 1 0 0.37754
    12 0 1 0 0.37754
    13 0 1 0 0.37754
    14 0 1 1 0.62246
    15 0 1 1 0.62246
    16 0 1 1 0.62246
    17 0 1 1 0.62246
    18 0 1 1 0.62246
    19 1 0 0 0.81757
    20 1 0 0 0.81757
    21 1 0 0 0.81757
    22 1 0 0 0.81757
    23 1 0 0 0.81757
    24 1 0 1 0.18243
    25 1 0 1 0.18243
    26 1 0 1 0.18243
    27 1 0 1 0.18243
    28 1 0 1 0.18243
    29 1 1 0 0.26894
    30 1 1 0 0.26894
    31 1 1 0 0.26894
    32 1 1 0 0.26894
    33 1 1 0 0.26894
    34 1 1 0 0.26894
    35 1 1 1 0.73106
    36 1 1 1 0.73106
    37 1 1 1 0.73106
    38 1 1 1 0.73106
    39 1 1 1 0.73106
    40 1 1 1 0.73106


  2. Next use the exemplary data as the input data set in PROC LOGISTIC, assigning PY as the WEIGHT variable. First fit the full model and save both the parameter estimates and fit statistics output tables as data sets. (You will later use a parameter estimate for the Wald chi-square power analysis and a fit statistic for the likelihood ratio test.)

    * Run full model;
    proc logistic data=Exemplary;
       ods output parameterestimates=fitFull fitstatistics=statsFull;
       weight PY;
       model Y(event='1') = X1 X2 X1*X2;
    run;
    

    Output 96.11.3 and Output 96.11.4 show these tables from the PROC LOGISTIC output.

    Output 96.11.3: Parameter Estimates for Full Model

    The LOGISTIC Procedure

    Analysis of Maximum Likelihood Estimates
    Parameter DF Estimate Standard
    Error
    Wald
    Chi-Square
    Pr > ChiSq
    Intercept 1 1.69E-16 1.0000 0.0000 1.0000
    X1 1 -1.4999 1.5300 0.9611 0.3269
    X2 1 0.5000 1.3605 0.1351 0.7132
    X1*X2 1 1.9999 2.0099 0.9901 0.3197


    Output 96.11.4: Fit Statistics for Full Model

    Model Fit Statistics
    Criterion Intercept Only Intercept and
    Covariates
    AIC 29.692 31.911
    SC 31.381 38.666
    -2 Log L 27.692 23.911


    The numbers you need for the power analysis for the Wald chi-square test of X1*X2 are the degrees of freedom (1) and the Wald chi-square test statistic (0.9901). The number you need for the likelihood ratio test is the –2LogL value (23.911). The following DATA step saves these values to data sets:

    * Fetch Wald chi-square statistic for test of X1*X2;
    data fitFull;
       set fitFull;
       where Variable = "X1*X2";
       keep DF WaldChiSq;
    run;
    
    * Fetch -2LogL for full model;
    data statsFull;
       set statsFull;
       where Criterion = "-2 Log L";
       Neg2LogLFull = InterceptAndCovariates;
       keep Neg2LogLFull;
    run;
    

    Next run the reduced model (dropping the interaction term, the one you want to test) and save the fit statistics table.

    * Run reduced model;
    proc logistic data=Exemplary;
       ods output fitstatistics=statsRed;
       weight PY;
       model Y = X1 X2;
    run;
    

    Output 96.11.5 shows the fit statistics table from the PROC LOGISTIC output.

    Output 96.11.5: Fit Statistics for Reduced Model

    The LOGISTIC Procedure

    Model Fit Statistics
    Criterion Intercept Only Intercept and
    Covariates
    AIC 29.692 30.939
    SC 31.381 36.005
    -2 Log L 27.692 24.939


    The number you need for the power analysis for the likelihood ratio test is the –2LogL value (24.939). The following DATA step saves these values to data sets:

    * Fetch -2LogL for reduced model;
    data statsRed;
       set statsRed;
       where Criterion = "-2 Log L";
       Neg2LogLRed = InterceptAndCovariates;
       keep Neg2LogLRed;
    run;
    

    Now you are ready to compute the primary noncentrality values. You need to divide both the Wald chi-square and –2LogL difference by the "effective sample size" of the exemplary data set—that is, the number of rows for each response value (20). The following statements compute the primary noncentrality and store all three numbers that you need for PROC POWER in macro variables:

    * Compute primary noncentralities for Wald and LR tests;
    data PrimNC;
       merge fitFull statsFull statsRed;
       WaldPrimNC = WaldChiSq / 20;
       LRPrimNC = -(Neg2LogLFull-Neg2LogLRed) / 20;
       keep DF WaldPrimNC LRPrimNC;
       call symput ("DF", DF);
       call symput ("WaldPrimNC", WaldPrimNC);
       call symput ("LRPrimNC", LRPrimNC);
    run;
    
    proc print data=PrimNC;
    run;
    

    The primary noncentrality values thus computed are shown in Output 96.11.6.

    Output 96.11.6: Primary Noncentrality Values Derived from PROC LOGISTIC

    Obs DF WaldPrimNC LRPrimNC
    1 1 0.049506 0.051396


  3. Perform the power analysis by using the CUSTOM statement in PROC POWER as follows:

    proc power;
       custom
          dist   = chisquare
          primnc = &WaldPrimNC &LRPrimNC
          testdf = &DF
          ntotal = 100
          power  = .;
    run;
    

The computed powers for the two tests, shown in Output 96.11.7, match the results in Lyles, Lin, and Williamson (2007, section 3.2).

Output 96.11.7: Computed Power Values

The POWER Procedure
Custom Test

Fixed Scenario Elements
Distribution Chi-square
Method Default
Test Degrees of Freedom 1
Total Sample Size 100
Alpha 0.05
Primary Noncentrality Multiplier 1
Critical Value Multiplier 1

Computed Power
Index Primary NC Power
1 0.0495 0.605
2 0.0514 0.621


The following statements validate the PROC POWER results by using the QUANTILE and SDF functions in the DATA step:

data Power;
   set PrimNC;
   Crit = quantile("chisq", 1-0.05, DF);
   WaldPower = sdf("chisq", Crit, DF, 100 * WaldPrimNC);
   LRPower = sdf("chisq", Crit, DF, 100 * LRPrimNC);
   keep WaldPower LRPower;
run;
proc print data=Power;
run;

The computed powers for the two tests, shown in Output 96.11.8, match the results in Lyles, Lin, and Williamson (2007, section 3.2).

Output 96.11.8: Validation of Power Values

Obs WaldPower LRPower
1 0.60452 0.62063


Last updated: December 09, 2022