The ICPHREG Procedure

MODEL Statement

  • MODEL (t1, t2)= effects </ options>;

  • MODEL response <*censor (list)> = effects </ options>;

The MODEL statement identifies the variables to be used as the failure time variables and the explanatory effects, including covariates, main effects, interactions, nested effects. For more information, see the section Specification of Effects in Chapter 53, The GLM Procedure.

You can specify two forms of MODEL syntax: the first form allows two time variables, and the second form allows one time variable.

The first form of the MODEL statement enables you to analyze time-to-event data that have interval-censored outcomes. The MODEL syntax specifies two variables, t1 and t2, that contain values of the endpoints of the censoring interval. Only nonnegative values are accepted. If the two values are the same (and not missing), it is assumed that there is no censoring and the actual response value is observed. If the lower value is missing, then the upper value is used as a left-censored value. If the upper value is missing, then the lower value is used as a right-censored value. If both values are present and the lower value is less than the upper value, it is assumed that the values specify a censoring interval. If the lower value is greater than the upper value or both values are missing, then the observation is not used in the analysis.

The following table summarizes the ways of specifying censoring.

Lower Value Upper Value Comparison Interpretation
Not missing Not missing Equal No censoring
Not missing Not missing Lower < upper Censoring interval
Missing Not missing Upper used as left-
censoring value
Not missing Missing Lower used as right-
censoring value
Not missing Not missing Lower > upper Observation not used
Missing Missing Observation not used

The second form of the MODEL statement enables you to analyze right-censored data or time-to-event data that contain repeated assessments and possibly time-dependent covariates (for more information, see the section Semiparametric Model and Time-Dependent Covariates). The name of the failure time variable precedes the equal sign. This name can optionally be followed by an asterisk, the name of the censoring variable, and a list of censoring values (separated by blanks or commas) enclosed in parentheses. If the censoring variable takes one of these values, the corresponding failure time is considered to be censored. Following the equal sign are the explanatory effects (sometimes called independent variables or covariates) for the model.

Table 5 summarizes the options that you can specify in the MODEL statement.

Table 5: MODEL Statement Options

Option Description
Model Specification Options
ALPHA= Specifies the confidence level
BASE= Specifies the functional form for the baseline function
ENTRY= Specifies the left-truncation time of the model
HAZSCALE= Requests parameterization of the hazard function in the original scale or in log scale
NOPOLISH Suppresses polishing of parameter estimates of the hazard function
OFFSET= Specifies an offset variable to be added to the linear predictor
PLVARIANCE Computes the standard error estimates on the basis of the profile likelihood function
Output Options
CORRB Displays the estimated correlation matrix
COVB Displays the estimated covariance matrix


ALPHA=value

specifies the level for the confidence intervals for parameters. The value must be between 0 and 1. By default, ALPHA=0.05.

CORRB

displays the estimated correlation matrix of the parameter estimates.

COVB

displays the estimated covariance matrix of the parameter estimates.

BASE=baseline-type
BASEHAZ=baseline-type
B=baseline-type

specifies a functional form for the baseline function. You can specify one of the following baseline-types:

PCH (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
PIECEWISE (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
PIECEWISEEXPONENTIAL (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)
PCBH (<NINTERVAL=number>, <INTERVALS=(numeric-list)>)

partitions the time scale into disjoint intervals and assumes the baseline hazard function is piecewise constant within intervals. The parameters are the piecewise constant values of the baseline hazard functions and are named Haz1, Haz2, ellipsis, and so on. If HAZARDSCALE=LOGHAZ is specified, the names are LogHaz1, LogHaz2, ellipsis, and so on.

You can specify one of the following two options to control how to partition the time axis into intervals of constant baseline hazards:

NINTERVAL=number
N=number

specifies the number of intervals that have a constant hazard rate in each interval. PROC ICPHREG partitions the time axis into the number of intervals so that each interval contains an approximately equal number of unique boundary values and imputed middle points.

INTERVALS=(numeric-list)
INTERVAL=(numeric-list)

specifies a list of numbers that partition the time axis into disjoint intervals that have constant hazard rate in each interval. For example, INTERVALS=(100, 150, 200, 250, 300) specifies a model that has a constant hazard in the intervals [0,100), [100,150), [150,200), [200,250), [250,300), and [300,normal infinity).

If you specify neither NINTERVAL= nor INTERVAL=, NINTERVAL=5 by default.

SPLINES (<DF=number>)
CUBICSPLINES (<DF=number>)

models the baseline cumulative hazard function by cubic splines (Royston and Parmar 2002). The parameters are the spline coefficients and are named Coef1, Coef2, ellipsis, and so on.

You can specify the degrees of freedom in the DF=number option, where number must be an integer. The number of knots equals number plus one. The actual positions of the knots are determined from an imputed data set as follows. First, PROC ICPHREG imputes a middle point for each observation in the input data set that is not right-censored. Then, it sorts these imputed times and the input boundary values in increasing order and selects only unique values. PROC ICPHREG places the terminal knots at the minimum and maximum of this sequence and chooses the interval knots by using the same method it uses to choose the break points for the piecewise constant model. For more information, see the section Choosing Break Points.

By default, DF=2.

UNSPECIFIED
DISCRETE

models the cumulative hazard function as a discrete function in which jumps are identified according to Turnbull’s formulation (1976). The parameters are named Eta1, Eta2, and so on.

The default fitting method for this type of model is EMICM. An alternative is the EM algorithm. For more information about these algorithms, see the section EM Algorithm and Extensions.

If you do not specify the BASEHAZ= option, the ICPHREG procedure fits a piecewise constant model as if NINTERVAL=5.

ENTRYTIME=variable
ENTRY=variable

specifies the name of the variable that represents the left-truncation time. For more information, see the section Left-Truncation of Failure Times.

NOPOLISH

suppresses polishing of parameter estimates of the baseline function. Occasionally, the parameter estimates of the baseline function can reach the default optimization lower bounds. This might indicate that the model is overparameterized. By default, the ICPHREG procedure "polishes" the hazard estimates by fixing these parameters at the lower bound value and refitting the model.

The lower bound values are set to 0 if the baseline parameters are on the original scale (HAZ-SCALE=HAZARD). The values are set to –10.0 if they are on the log scale (HAZSCALE=LOGHAZ).

This option does not apply to the cubic spline model because its baseline parameters are unbounded.

OFFSET=variable

specifies a variable in the input data set to be used as an offset variable. This variable cannot be a CLASS variable, the response variable, or any of the explanatory variables.

HAZSCALE=hazard-type

specifies a transformation to be applied to the baseline parameters for fitting the piecewise constant model. You can choose either of the following two options:

LOGHAZ
LOG
LOGHAZARD

uses the log transformed baseline parameters.

HAZARD
HAZ

does not transform the baseline parameters. A lower bound of 0 is used for fitting the models.

This option does not apply to the cubic spline model and the semiparametric model.

PLVARIANCE

computes the standard error estimates on the basis of the profile likelihood function, as opposed to the default Louis’s method (Louis 1982). For more information, see the section Variance Estimation. This option applies only to the semiparametric model.

Last updated: March 08, 2022