The MODEL statement identifies the variables to be used as the failure time variables and the explanatory effects, including covariates, main effects, interactions, nested effects. For more information, see the section Specification of Effects in Chapter 53, The GLM Procedure.
You can specify two forms of MODEL syntax: the first form allows two time variables, and the second form allows one time variable.
The first form of the MODEL statement enables you to analyze time-to-event data that have interval-censored outcomes. The MODEL syntax specifies two variables, t1 and t2, that contain values of the endpoints of the censoring interval. Only nonnegative values are accepted. If the two values are the same (and not missing), it is assumed that there is no censoring and the actual response value is observed. If the lower value is missing, then the upper value is used as a left-censored value. If the upper value is missing, then the lower value is used as a right-censored value. If both values are present and the lower value is less than the upper value, it is assumed that the values specify a censoring interval. If the lower value is greater than the upper value or both values are missing, then the observation is not used in the analysis.
The following table summarizes the ways of specifying censoring.
| Lower Value |
|
Upper Value |
|
Comparison |
|
Interpretation |
| Not missing |
|
Not missing |
|
Equal |
|
No censoring |
| Not missing |
|
Not missing |
|
Lower < upper |
|
Censoring interval |
| Missing |
|
Not missing |
|
|
|
Upper used as left- |
|
|
|
|
|
|
censoring value |
| Not missing |
|
Missing |
|
|
|
Lower used as right- |
|
|
|
|
|
|
censoring value |
| Not missing |
|
Not missing |
|
Lower > upper |
|
Observation not used |
| Missing |
|
Missing |
|
|
|
Observation not used |
The second form of the MODEL statement enables you to analyze right-censored data or time-to-event data that contain repeated assessments and possibly time-dependent covariates (for more information, see the section Semiparametric Model and Time-Dependent Covariates). The name of the failure time variable precedes the equal sign. This name can optionally be followed by an asterisk, the name of the censoring variable, and a list of censoring values (separated by blanks or commas) enclosed in parentheses. If the censoring variable takes one of these values, the corresponding failure time is considered to be censored. Following the equal sign are the explanatory effects (sometimes called independent variables or covariates) for the model.
Table 5 summarizes the options that you can specify in the MODEL statement.
Table 5: MODEL Statement Options
| Option |
Description |
| Model Specification Options |
|
ALPHA= |
Specifies the confidence level |
|
BASE= |
Specifies the functional form for the baseline function |
|
ENTRY= |
Specifies the left-truncation time of the model |
|
HAZSCALE= |
Requests parameterization of the hazard function in the original scale or in log scale |
|
NOPOLISH |
Suppresses polishing of parameter estimates of the hazard function |
|
OFFSET= |
Specifies an offset variable to be added to the linear predictor |
|
PLVARIANCE |
Computes the standard error estimates on the basis of the profile likelihood function |
| Output Options |
|
CORRB |
Displays the estimated correlation matrix |
|
COVB |
Displays the estimated covariance matrix |
-
ALPHA=value
specifies the level for the confidence intervals for
parameters. The value must be between 0 and 1. By default, ALPHA=0.05.
-
CORRB
displays the estimated correlation matrix of the parameter estimates.
-
COVB
displays the estimated covariance matrix of the parameter estimates.
-
BASE=baseline-type
BASEHAZ=baseline-type
B=baseline-type
-
specifies a functional form for the baseline function.
You can specify one of the following baseline-types:
-
PCH (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
PIECEWISE (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
PIECEWISEEXPONENTIAL (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
PCBH (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
-
partitions the time scale into disjoint intervals and assumes the baseline hazard function is piecewise constant within intervals. The parameters are the piecewise constant values of the baseline hazard functions and are named Haz1, Haz2,
, and so on. If HAZARDSCALE=LOGHAZ is specified, the names are LogHaz1, LogHaz2,
, and so on.
You can specify one of the following two options to control how to partition the time axis into intervals of constant baseline hazards:
-
NINTERVAL=number
N=number
specifies the number of intervals that have a constant hazard rate in each interval. PROC ICPHREG partitions the time axis into the number of intervals so that each interval contains an approximately equal number of unique boundary values and imputed middle points.
-
INTERVALS=(numeric-list)
INTERVAL=(numeric-list)
specifies a list of numbers that partition the time axis into disjoint intervals that have constant hazard rate in each interval. For example, INTERVALS=(100, 150, 200, 250, 300) specifies a model that has a constant hazard in the intervals [0,100), [100,150), [150,200), [200,250), [250,300), and [300,
).
If you specify neither NINTERVAL= nor INTERVAL=, NINTERVAL=5 by default.
-
SPLINES (<DF=number>)
CUBICSPLINES (<DF=number>)
-
models the baseline cumulative hazard function by cubic splines (Royston and Parmar 2002). The parameters are the spline coefficients and are named Coef1, Coef2,
, and so on.
You can specify the degrees of freedom in the DF=number option, where number must be an integer. The number of knots equals number plus one. The actual positions of the knots are determined from an imputed data set as follows. First, PROC ICPHREG imputes a middle point for each observation in the input data set that is not right-censored. Then, it sorts these imputed times and the input boundary values in increasing order and selects only unique values. PROC ICPHREG places the terminal knots at the minimum and maximum of this sequence and chooses the interval knots by using the same method it uses to choose the break points for the piecewise constant model. For more information, see the section Choosing Break Points.
By default, DF=2.
-
UNSPECIFIED
DISCRETE
-
models the cumulative hazard function as a discrete function in which jumps are identified according to Turnbull’s formulation (1976). The parameters are named Eta1, Eta2, and so on.
The default fitting method for this type of model is EMICM. An alternative is the EM algorithm. For more information about these algorithms, see the section EM Algorithm and Extensions.
If you do not specify the BASEHAZ= option, the ICPHREG procedure fits a piecewise constant model as if NINTERVAL=5.
-
ENTRYTIME=variable
ENTRY=variable
specifies the name of the variable that represents the left-truncation time. For more information, see the section Left-Truncation of Failure Times.
-
NOPOLISH
-
suppresses polishing of parameter estimates of the baseline function.
Occasionally, the parameter estimates of the baseline function can reach the default optimization lower bounds. This might indicate that the model is overparameterized. By default, the ICPHREG procedure "polishes" the hazard estimates by fixing these parameters at the lower bound value and refitting the model.
The lower bound values are set to 0 if the baseline parameters are on the original scale (HAZ-SCALE=HAZARD). The values are set to –10.0 if they are on the log scale (HAZSCALE=LOGHAZ).
This option does not apply to the cubic spline model because its baseline parameters are unbounded.
-
OFFSET=variable
specifies a variable in the input data set to be used as an offset
variable. This variable cannot be a CLASS variable, the response variable, or any of the explanatory variables.
-
HAZSCALE=hazard-type
-
specifies a transformation to be applied to the baseline parameters for fitting the piecewise constant model. You can choose either of the following two options:
-
LOGHAZ
LOG
LOGHAZARD
uses the log transformed baseline parameters.
-
HAZARD
HAZ
does not transform the baseline parameters. A lower bound of 0 is used for fitting the models.
This option does not apply to the cubic spline model and the semiparametric model.
-
PLVARIANCE
computes the standard error estimates on the basis of the profile likelihood function, as opposed to the default Louis’s method (Louis 1982). For more information, see the section Variance Estimation. This option applies only to the semiparametric model.