In the counting process formulation, data for each subject are identified by a triple of counting, at-risk, and covariate processes.
indicates the sum of weights for all events that the subject experiences over the time interval
,
indicates whether the subject is at risk at time t (1 if at risk and 0 otherwise), and
is a vector of explanatory variables for the subject at time t. The sample path of N is a step function with jumps at the event times, and
. Unless
changes continuously with time, the data for each subject can be represented by multiple observations, each of which identifies by a semiclosed time interval
, the values of the explanatory variables over that interval, and the event status at
. The subject remains at risk during the interval
, and an event might occur at
. Values of the explanatory variables for the subject remain unchanged in the interval. This style of data input was originated by Therneau (1994).
For example, suppose a patient (ID=1) with an analysis weight of 10 has a tumor recurrence at weeks 3, 10, and 15 and is followed up until week 23. Consider three fixed explanatory variables Trt (treatment), Number (initial tumor number), and Size (initial tumor size), one weight variable Weight (analysis weight), one patient identification variable ID, and one time-dependent covariate Z that represents a hormone level. The value of Z might change during the follow-up period. The data for this patient are represented by the following four observations:
| ID | Weight | T1 |
T2 |
Status |
Trt |
Number |
Size |
Z
|
|---|---|---|---|---|---|---|---|---|
| 1 | 10 | 0 | 3 | 1 | 1 | 1 | 3 | 12.3 |
| 1 | 10 | 3 | 10 | 1 | 1 | 1 | 3 | 14.7 |
| 1 | 10 | 10 | 15 | 1 | 1 | 1 | 3 | 13.8 |
| 1 | 10 | 15 | 23 | 0 | 1 | 1 | 3 | 15.5 |
Here (T1,T2] contains the at-risk intervals. The variable Status indicates whether a recurrence has occurred at T2: a value of 1 indicates a tumor recurrence, and a value of 0 indicates non-recurrence. Assume the patients are selected independently. Because there are multiple observation rows for every patient, you should use the CLUSTER statement to identify each individual patient. The CLUSTER statement computes the variability between the patients. The following statements fit a multiplicative hazards model with baseline covariates Trt, Number, and Size, and a time-varying covariate Z. For more information, see the section The Multiplicative Hazards Model.
proc surveyphreg;
weight Weight;
cluster ID;
model (T1,T2) * Status(0) = Trt Number Size Z;
run;
Another useful application of the counting process formulation is the delayed entry of subjects into the risk set. For example, in studying the mortality of workers exposed to a carcinogen, the survival time is chosen to be the worker’s age at death by malignant neoplasm. Any worker who joins the workplace at an age later than the failure time of an event is not included in the corresponding risk set. The variables for a worker consist of Entry (age at which the worker entered the workplace), Age (age at death or age censored), Status (an indicator of whether the observation time is censored, with the value 0 identifying a censored time), and X1 and X2 (explanatory variables thought to be related to survival). The specification for such an application is as follows:
proc surveyphreg;
model (Entry, Age) * Status(0) = X1 X2;
run;