The LOGISTIC Procedure

Example 79.15 Complementary Log-Log Model for Interval-Censored Survival Times

(View the complete code for this example.)

Often survival times are not observed more precisely than the interval (for instance, a day) within which the event occurred. Survival data of this form are known as grouped or interval-censored data. A discrete analog of the continuous proportional hazards model (Prentice and Gloeckler 1978; Allison 1982) is used to investigate the relationship between these survival times and a set of explanatory variables. For more methods of fitting these data, see Chapter 70, The ICPHREG Procedure.

Suppose upper T Subscript i is the discrete survival time variable of the ith subject with covariates bold x Subscript i. The discrete-time hazard rate lamda Subscript i t is defined as

lamda Subscript i t Baseline equals probability left-parenthesis upper T Subscript i Baseline equals t vertical-bar upper T Subscript i Baseline greater-than-or-equal-to t comma bold x Subscript i Baseline right-parenthesis comma t equals 1 comma 2 comma ellipsis

Using elementary properties of conditional probabilities, it can be shown that

probability left-parenthesis upper T Subscript i Baseline equals t right-parenthesis equals lamda Subscript i t Baseline product Underscript j equals 1 Overscript t minus 1 Endscripts left-parenthesis 1 minus lamda Subscript i j Baseline right-parenthesis and probability left-parenthesis upper T Subscript i Baseline greater-than t right-parenthesis equals product Underscript j equals 1 Overscript t Endscripts left-parenthesis 1 minus lamda Subscript i j Baseline right-parenthesis

Suppose t Subscript i is the observed survival time of the ith subject. Suppose delta Subscript i Baseline equals 1 if upper T Subscript i Baseline equals t Subscript i is an event time and 0 otherwise. The likelihood for the grouped survival data is given by

StartLayout 1st Row 1st Column upper L 2nd Column equals 3rd Column product Underscript i Endscripts left-bracket probability left-parenthesis upper T Subscript i Baseline equals t Subscript i Baseline right-parenthesis right-bracket Superscript delta Super Subscript i Baseline left-bracket probability left-parenthesis upper T Subscript i Baseline greater-than t Subscript i Baseline right-parenthesis right-bracket Superscript 1 minus delta Super Subscript i 2nd Row 1st Column Blank 2nd Column equals 3rd Column product Underscript i Endscripts left-parenthesis StartFraction lamda Subscript i t Sub Subscript i Subscript Baseline Over 1 minus lamda Subscript i t Sub Subscript i Subscript Baseline EndFraction right-parenthesis Superscript delta Super Subscript i Baseline product Underscript j equals 1 Overscript t Subscript i Endscripts left-parenthesis 1 minus lamda Subscript i j Baseline right-parenthesis 3rd Row 1st Column Blank 2nd Column equals 3rd Column product Underscript i Endscripts product Underscript j equals 1 Overscript t Subscript i Endscripts left-parenthesis StartFraction lamda Subscript i j Baseline Over 1 minus lamda Subscript i j Baseline EndFraction right-parenthesis Superscript y Super Subscript i j Baseline left-parenthesis 1 minus lamda Subscript i j Baseline right-parenthesis EndLayout

where y Subscript i j Baseline equals 1 if the ith subject experienced an event at time upper T Subscript i Baseline equals j and 0 otherwise.

Note that the likelihood L for the grouped survival data is the same as the likelihood of a binary response model with event probabilities lamda Subscript i j. If the data are generated by a continuous-time proportional hazards model, Prentice and Gloeckler (1978) have shown that

lamda Subscript i j Baseline equals 1 minus exp left-parenthesis minus exp left-parenthesis alpha Subscript j Baseline plus bold-italic beta prime bold x Subscript i Baseline right-parenthesis right-parenthesis

which can be rewritten as

log left-parenthesis minus log left-parenthesis 1 minus lamda Subscript i j Baseline right-parenthesis right-parenthesis equals alpha Subscript j Baseline plus bold-italic beta prime bold x Subscript i

where the coefficient vector bold-italic beta is identical to that of the continuous-time proportional hazards model, and alpha Subscript j is a constant related to the conditional survival probability in the interval defined by upper T Subscript i Baseline equals j at bold x Subscript i Baseline equals bold 0. The grouped data survival model is therefore equivalent to the binary response model with complementary log-log link function. To fit the grouped survival model by using PROC LOGISTIC, you must treat each discrete time unit for each subject as a separate observation. For each of these observations, the response is dichotomous, corresponding to whether or not the subject died in the time unit.

Consider a study of the effect of insecticide on flour beetles. Four different concentrations of an insecticide were sprayed on separate groups of flour beetles. The following DATA step saves the number of male and female flour beetles dying in successive intervals in the data set Beetles:

data Beetles(keep=time sex conc freq);
   input time m20 f20 m32 f32 m50 f50 m80 f80;
   conc=.20; freq= m20; sex=1; output;
             freq= f20; sex=2; output;
   conc=.32; freq= m32; sex=1; output;
             freq= f32; sex=2; output;
   conc=.50; freq= m50; sex=1; output;
             freq= f50; sex=2; output;
   conc=.80; freq= m80; sex=1; output;
             freq= f80; sex=2; output;
   datalines;
 1   3   0  7  1  5  0  4  2
 2  11   2 10  5  8  4 10  7
 3  10   4 11 11 11  6  8 15
 4   7   8 16 10 15  6 14  9
 5   4   9  3  5  4  3  8  3
 6   3   3  2  1  2  1  2  4
 7   2   0  1  0  1  1  1  1
 8   1   0  0  1  1  4  0  1
 9   0   0  1  1  0  0  0  0
10   0   0  0  0  0  0  1  1
11   0   0  0  0  1  1  0  0
12   1   0  0  0  0  1  0  0
13   1   0  0  0  0  1  0  0
14 101 126 19 47  7 17  2  4
;

The data set Beetles contains four variables: time, sex, conc, and freq. The variable time represents the interval death time; for example, time=2 is the interval between day 1 and day 2. Insects surviving the duration (13 days) of the experiment are given a time value of 14. The variable sex represents the sex of the insects (1=male, 2=female), conc represents the concentration of the insecticide (mg/cmsquared), and freq represents the frequency of the observations.

To use PROC LOGISTIC with the grouped survival data, you must expand the data so that each beetle has a separate record for each day of survival. A beetle that died in the third day (time=3) would contribute three observations to the analysis, one for each day it was alive at the beginning of the day. A beetle that survives the 13-day duration of the experiment (time=14) would contribute 13 observations.

The following DATA step creates a new data set named Days containing the beetle-day observations from the data set Beetles. In addition to the variables sex, conc, and freq, the data set contains an outcome variable y and a classification variable day. The variable y has a value of 1 if the observation corresponds to the day that the beetle died, and it has a value of 0 otherwise. An observation for the first day will have a value of 1 for day; an observation for the second day will have a value of 2 for day, and so on. For instance, Output 79.15.1 shows an observation in the Beetles data set with time=3, and Output 79.15.2 shows the corresponding beetle-day observations in the data set Days.

data Days;
   set Beetles;
   do day=1 to time;
      if (day < 14) then do;
         y= (day=time);
         output;
      end;
   end;
run;

Output 79.15.1: An Observation with Time=3 in Beetles Data Set

Obs time conc freq sex
17 3 0.2 10 1


Output 79.15.2: Corresponding Beetle-Day Observations in Days

Obs time conc freq sex day y
25 3 0.2 10 1 1 0
26 3 0.2 10 1 2 0
27 3 0.2 10 1 3 1


The following statements invoke PROC LOGISTIC to fit a complementary log-log model for binary data with the response variable Y and the explanatory variables day, sex, and Variableconc. Specifying the EVENT= option ensures that the event (y=1) probability is modeled. The GLM coding in the CLASS statement creates an indicator column in the design matrix for each level of day. The coefficients of the indicator effects for day can be used to estimate the baseline survival function. The NOINT option is specified to prevent any redundancy in estimating the coefficients of day. The Newton-Raphson algorithm is used for the maximum likelihood estimation of the parameters.

proc logistic data=Days outest=est1;
   class day / param=glm;
   model y(event='1')= day sex conc
         / noint link=cloglog technique=newton;
   freq freq;
run;

Results of the model fit are given in Output 79.15.3. Both sex and conc are statistically significant for the survival of beetles sprayed by the insecticide. Female beetles are more resilient to the chemical than male beetles, and increased concentration of the insecticide increases its effectiveness.

Output 79.15.3: Parameter Estimates for the Grouped Proportional Hazards Model

Analysis of Maximum Likelihood Estimates
Parameter   DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
day 1 1 -3.9314 0.2934 179.5602 <.0001
day 2 1 -2.8751 0.2412 142.0596 <.0001
day 3 1 -2.3985 0.2299 108.8833 <.0001
day 4 1 -1.9953 0.2239 79.3960 <.0001
day 5 1 -2.4920 0.2515 98.1470 <.0001
day 6 1 -3.1060 0.3037 104.5799 <.0001
day 7 1 -3.9704 0.4230 88.1107 <.0001
day 8 1 -3.7917 0.4007 89.5233 <.0001
day 9 1 -5.1540 0.7316 49.6329 <.0001
day 10 1 -5.1350 0.7315 49.2805 <.0001
day 11 1 -5.1131 0.7313 48.8834 <.0001
day 12 1 -5.1029 0.7313 48.6920 <.0001
day 13 1 -5.0951 0.7313 48.5467 <.0001
sex   1 -0.5651 0.1141 24.5477 <.0001
conc   1 3.0918 0.2288 182.5665 <.0001


The coefficients of parameters for the day variable are the maximum likelihood estimates of alpha 1 comma ellipsis comma alpha 13, respectively. The baseline survivor function upper S 0 left-parenthesis t right-parenthesis is estimated by

ModifyingAbove upper S With caret Subscript 0 Baseline left-parenthesis t right-parenthesis equals ModifyingAbove probability With caret left-parenthesis upper T greater-than t right-parenthesis equals product Underscript j less-than-or-equal-to t Endscripts exp left-parenthesis minus exp left-parenthesis ModifyingAbove alpha With caret Subscript j Baseline right-parenthesis right-parenthesis

and the survivor function for a given covariate pattern (sex=x 1 and conc=x 2) is estimated by

ModifyingAbove upper S With caret left-parenthesis t right-parenthesis equals left-bracket ModifyingAbove upper S With caret Subscript 0 Baseline left-parenthesis t right-parenthesis right-bracket Superscript exp left-parenthesis minus 0.5651 x 1 plus 3.0918 x 2 right-parenthesis

The following statements compute the survival curves for male and female flour beetles exposed to the insecticide in concentrations of 0.20 mg/cmsquared and 0.80 mg/cmsquared:

data one (keep=day survival element s_m20 s_f20 s_m80 s_f80);
   array dd[13] day1-day13;
   array sc[4] m20 f20 m80 f80;
   array s_sc[4] s_m20 s_f20 s_m80 s_f80 (1 1 1 1);
   set est1;
   m20= exp(sex + .20 * conc);
   f20= exp(2 * sex + .20 * conc);
   m80= exp(sex + .80 * conc);
   f80= exp(2 * sex + .80 * conc);
   survival=1;
   day=0;
   output;
   do day=1 to 13;
      element= exp(-exp(dd[day]));
      survival= survival * element;
      do i=1 to 4;
         s_sc[i] = survival ** sc[i];
      end;
      output;
   end;
run;

Instead of plotting the curves as step functions, the following statements use the PBSPLINE statement in the SGPLOT procedure to smooth the curves with a penalized B-spline. For more information about the implementation of the penalized B-spline method, see Chapter 126, The TRANSREG Procedure. The SAS autocall macro %MODSTYLE is specified to change the marker symbols for the plot. For more information about the %MODSTYLE macro, see the section ODS Style Template Modification Macro in Chapter 24, Statistical Graphics Using ODS. The smoothed survival curves are displayed in Output 79.15.4.

%modstyle(name=LogiStyle,parent=htmlblue,markers=circlefilled);
ods listing style=LogiStyle;
proc sgplot data=one;
   title 'Flour Beetles Sprayed with Insecticide';
   xaxis grid integer;
   yaxis grid label='Survival Function';
   pbspline y=s_m20 x=day /
      legendlabel = "Male at 0.20 conc." name="pred1";
   pbspline y=s_m80 x=day /
      legendlabel = "Male at 0.80 conc." name="pred2";
   pbspline y=s_f20 x=day /
      legendlabel = "Female at 0.20 conc." name="pred3";
   pbspline y=s_f80 x=day /
      legendlabel = "Female at 0.80 conc." name="pred4";
   discretelegend "pred1" "pred2" "pred3" "pred4" / across=2;
run;

Output 79.15.4: Predicted Survival at Insecticide Concentrations of 0.20 and 0.80 mg/cmsquared

Predicted Survival at Insecticide Concentrations of 0.20 and 0.80 mg/cm2


The probability of survival is displayed on the vertical axis. Notice that most of the insecticide effect occurs by day 6 for both the high and low concentrations.

Last updated: December 09, 2022