The ADAPTIVEREG Procedure

Missing Values

When fitting a model, the ADAPTIVEREG procedure excludes observations that have missing values for the response variable, weight variable, or frequency variable. It also excludes observations with invalid response, weight, or frequency values. For observations that have valid response, weight, and frequency values but missing predictor values, the ADAPTIVEREG procedure can either include them in model fitting or exclude them.

By default, observations with missing values in the predictor variables are included in the model fitting. Suppose a variable bold v contains missing values. The ADAPTIVEREG procedure automatically forms two candidate bases, bold upper B Subscript m and bold upper B Subscript m plus 1, in the forward selection step when variable bold v is considered. When v is missing, bold upper B Subscript m plus 1 Baseline equals upper I left-parenthesis v normal i normal s normal m normal i normal s normal s normal i normal n normal g right-parenthesis. When v is not missing, bold upper B Subscript m Baseline equals upper I left-parenthesis v normal i normal s normal n normal o normal t normal m normal i normal s normal s normal i normal n normal g right-parenthesis. upper I left-parenthesis dot right-parenthesis is a scalar-valued indicator function that returns a 1 when the argument is true and a 0 when the argument is false.

If the transformation of bold v with a parent basis bold upper B Subscript i and a knot (or a subset) t turns out to be the best one during this iteration, then two more bases are added to the model:

bold upper B Subscript m plus 2 Baseline equals bold upper B Subscript i Baseline bold upper B Subscript m Baseline bold upper T 1 left-parenthesis v minus t right-parenthesis
bold upper B Subscript m plus 3 Baseline equals bold upper B Subscript i Baseline bold upper B Subscript m plus 1 Baseline bold upper T 2 left-parenthesis v minus t right-parenthesis

The indicator function does not contribute to the interaction order of the constructed bases. This approach assumes that the missingness in the training data is representative of missingness in future data to be predicted.

Alternatively, you can specify the NOMISS option in the MODEL statement to exclude from the model fitting all observations that have missing values in the predictor variables.

Last updated: December 09, 2022