The LOGISTIC Procedure

Example 79.14 Complementary Log-Log Model for Infection Rates

(View the complete code for this example.)

Antibodies produced in response to an infectious disease like malaria remain in the body after the individual has recovered from the disease. A serological test detects the presence or absence of such antibodies. An individual with such antibodies is called seropositive. In geographic areas where the disease is endemic, the inhabitants are at fairly constant risk of infection. The probability of an individual never having been infected in Y years is exp left-parenthesis minus mu upper Y right-parenthesis, where mu is the mean number of infections per year (see the appendix of Draper, Voller, and Carpenter 1972). Rather than estimating the unknown mu, epidemiologists want to estimate the probability of a person living in the area being infected in one year. This infection rate gamma is given by

gamma equals 1 minus e Superscript negative mu

The following statements create the data set sero, which contains the results of a serological survey of malarial infection. Individuals of nine age groups (Group) were tested. The variable A represents the midpoint of the age range for each age group. The variable N represents the number of individuals tested in each age group, and the variable R represents the number of individuals that are seropositive.

data sero;
   input Group A N R;
   X=log(A);
   label X='Log of Midpoint of Age Range';
   datalines;
1  1.5  123  8
2  4.0  132  6
3  7.5  182 18
4 12.5  140 14
5 17.5  138 20
6 25.0  161 39
7 35.0  133 19
8 47.0   92 25
9 60.0   74 44
;

For the ith group with the age midpoint upper A Subscript i, the probability of being seropositive is p Subscript i Baseline equals 1 minus exp left-parenthesis minus mu upper A Subscript i Baseline right-parenthesis. It follows that

log left-parenthesis minus log left-parenthesis 1 minus p Subscript i Baseline right-parenthesis right-parenthesis equals log left-parenthesis mu right-parenthesis plus log left-parenthesis upper A Subscript i Baseline right-parenthesis

By fitting a binomial model with a complementary log-log link function and by using X=log(A) as an offset term, you can estimate alpha equals log left-parenthesis mu right-parenthesis as an intercept parameter. The following statements invoke PROC LOGISTIC to compute the maximum likelihood estimate of alpha. The LINK=CLOGLOG option is specified to request the complementary log-log link function. Also specified is the CLPARM=PL option, which requests the profile-likelihood confidence limits for alpha.

proc logistic data=sero;
   model R/N= / offset=X
                link=cloglog
                clparm=pl
                scale=none;
   title 'Constant Risk of Infection';
run;

Results of fitting this constant risk model are shown in Output 79.14.1.

Output 79.14.1: Modeling Constant Risk of Infection

Constant Risk of Infection

The LOGISTIC Procedure

Model Information
Data Set WORK.SERO  
Response Variable (Events) R  
Response Variable (Trials) N  
Offset Variable X Log of Midpoint of Age Range
Model binary cloglog  
Optimization Technique Fisher's scoring  

Number of Observations Read 9
Number of Observations Used 9
Sum of Frequencies Read 1175
Sum of Frequencies Used 1175

Response Profile
Ordered
Value
Binary Outcome Total
Frequency
1 Event 193
2 Nonevent 982

Intercept-Only Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.


-2 Log L = 967.1158

Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 41.5032 8 5.1879 <.0001
Pearson 50.6883 8 6.3360 <.0001

Number of events/trials observations: 9


Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -4.6605 0.0725 4133.5626 <.0001
X 0 1.0000 0 . .

Parameter Estimates and Profile-Likelihood
Confidence Intervals
Parameter Estimate 95% Confidence Limits
Intercept -4.6605 -4.8057 -4.5219


Output 79.14.1 shows that the maximum likelihood estimate of alpha equals log left-parenthesis mu right-parenthesis and its estimated standard error are ModifyingAbove alpha With caret equals negative 4.6605 and ModifyingAbove sigma With caret Subscript ModifyingAbove alpha With caret Baseline equals 0.0725, respectively. The infection rate is estimated as

ModifyingAbove gamma With caret equals 1 minus e Superscript minus ModifyingAbove mu With caret Baseline equals 1 minus e Superscript minus e Super Superscript ModifyingAbove beta With caret Super Super Subscript 0 Baseline equals 1 minus e Superscript minus e Super Superscript negative 4.6605 Baseline equals 0.00942

The 95% confidence interval for gamma, obtained by back-transforming the 95% confidence interval for alpha, is (0.0082, 0.0108); that is, there is a 95% chance that, in repeated sampling, the interval of 8 to 11 infections per thousand individuals contains the true infection rate.

The goodness-of-fit statistics for the constant risk model are statistically significant (p less-than 0.0001), indicating that the assumption of constant risk of infection is not correct. You can fit a more extensive model by allowing a separate risk of infection for each age group. Suppose mu Subscript i is the mean number of infections per year for the ith age group. The probability of seropositive for the ith group with the age midpoint upper A Subscript i is p Subscript i Baseline equals 1 minus exp left-parenthesis minus mu Subscript i Baseline upper A Subscript i Baseline right-parenthesis, so that

log left-parenthesis minus log left-parenthesis 1 minus p Subscript i Baseline right-parenthesis right-parenthesis equals log left-parenthesis mu Subscript i Baseline right-parenthesis plus log left-parenthesis upper A Subscript i Baseline right-parenthesis

In the following statements, a complementary log-log model is fit containing Group as an explanatory classification variable with the GLM coding (so that a dummy variable is created for each age group), no intercept term, and X=log(A) as an offset term. The ODS OUTPUT statement saves the estimates and their 95% profile-likelihood confidence limits to the ClparmPL data set. Note that log left-parenthesis mu Subscript i Baseline right-parenthesis is the regression parameter associated with Groupequals i.

proc logistic data=sero;
   ods output ClparmPL=ClparmPL;
   class Group / param=glm;
   model R/N=Group / noint
                     offset=X
                     link=cloglog
                     clparm=pl;
   title 'Infectious Rates and 95% Confidence Intervals';
run;

Results of fitting the model with a separate risk of infection are shown in Output 79.14.2.

Output 79.14.2: Modeling Separate Risk of Infection

Infectious Rates and 95% Confidence Intervals

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates
Parameter   DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Group 1 1 -3.1048 0.3536 77.0877 <.0001
Group 2 1 -4.4542 0.4083 119.0164 <.0001
Group 3 1 -4.2769 0.2358 328.9593 <.0001
Group 4 1 -4.7761 0.2674 319.0600 <.0001
Group 5 1 -4.7165 0.2238 443.9920 <.0001
Group 6 1 -4.5012 0.1606 785.1350 <.0001
Group 7 1 -5.4252 0.2296 558.1114 <.0001
Group 8 1 -4.9987 0.2008 619.4666 <.0001
Group 9 0 0 . . .
X   0 1.0000 0 . .

Parameter Estimates and Profile-Likelihood
Confidence Intervals
Parameter   Estimate 95% Confidence Limits
Group 1 -3.1048 -3.8880 -2.4833
Group 2 -4.4542 -5.3769 -3.7478
Group 3 -4.2769 -4.7775 -3.8477
Group 4 -4.7761 -5.3501 -4.2940
Group 5 -4.7165 -5.1896 -4.3075
Group 6 -4.5012 -4.8333 -4.2019
Group 7 -5.4252 -5.9116 -5.0063
Group 8 -4.9987 -5.4195 -4.6289


For the first age group (Group=1), the point estimate of log left-parenthesis mu 1 right-parenthesis is –3.1048, which transforms into an infection rate of 1 minus exp left-parenthesis minus exp left-parenthesis negative 3.1048 right-parenthesis right-parenthesis equals 0.0438. A 95% confidence interval for this infection rate is obtained by transforming the 95% confidence interval for log left-parenthesis mu 1 right-parenthesis. For the first age group, the lower and upper confidence limits are 1 minus exp left-parenthesis minus exp left-parenthesis negative 3.8880 right-parenthesis equals 0.0203 and 1 minus exp left-parenthesis minus exp left-parenthesis negative 2.4833 right-parenthesis right-parenthesis equals 0.0801, respectively; that is, there is a 95% chance that, in repeated sampling, the interval of 20 to 80 infections per thousand individuals contains the true infection rate. The following statements perform this transformation on the estimates and confidence limits saved in the ClparmPL data set; the resulting estimated infection rates in one year’s time for each age group are displayed in Table 22. Note that the infection rate for the first age group is high compared to that of the other age groups.

data ClparmPL;
   set ClparmPL;
   Estimate=round( 1000*( 1-exp(-exp(Estimate)) ) );
   LowerCL =round( 1000*( 1-exp(-exp(LowerCL )) ) );
   UpperCL =round( 1000*( 1-exp(-exp(UpperCL )) ) );
run;

Table 22: Infection Rate in One Year

Number Infected per 1,000 People
Age Point 95% Confidence Limits
Group Estimate Lower Upper
1 44 20 80
2 12 5 23
3 14 8 21
4 8 5 14
5 9 6 13
6 11 8 15
7 4 3 7
8 7 4 10
9 15 11 20


Last updated: December 09, 2022