Introduction to Regression Procedures

Introduction

In a linear regression model, the mean of a response variable bold upper Y is a function of parameters and covariates in a statistical model. The many forms of regression models have their origin in the characteristics of the response variable (discrete or continuous, normally or nonnormally distributed), assumptions about the form of the model (linear, nonlinear, or generalized linear), assumptions about the data-generating mechanism (survey, observational, or experimental data), and estimation principles. Some models contain classification (or CLASS) variables that enter the model not through their values but through their levels. For an introduction to linear regression models, see Chapter 3, Introduction to Statistical Modeling with SAS/STAT Software. For information that is common to many of the regression procedures, see Chapter 20, Shared Concepts and Topics. The following procedures, listed in alphabetical order, perform at least one type of regression analysis.

ADAPTIVEREG

fits multivariate adaptive regression spline models. This is a nonparametric regression technique that combines both regression splines and model selection methods. PROC ADAPTIVEREG produces parsimonious models that do not overfit the data and thus have good predictive power. It supports CLASS variables. For more information, see Chapter 28, The ADAPTIVEREG Procedure.

CATMOD

analyzes data that can be represented by a contingency table. It fits linear models to functions of response frequencies, and it can be used for linear and logistic regression. PROC CATMOD supports CLASS variables. For more information, see Chapter 9, Introduction to Categorical Data Analysis Procedures, and Chapter 36, The CATMOD Procedure.

GAM

fits generalized additive models by using the backfitting method of Hastie and Tibshirani (1990). PROC GAM uses smoothing splines to model each unknown additive term. It supports CLASS variables. For more information, see Chapter 48, The GAM Procedure.

Note: As an alternative to PROC GAM, the GAMPL procedure provides an approach that searches for models that have optimal degrees of freedom for each smoother. It is also more computationally efficient than PROC GAM, especially for large problems.

GAMPL

fits generalized additive models by using the penalized likelihood method of Wood (2006). PROC GAMPL uses low-rank regression splines to model each unknown additive term. It supports CLASS variables. For more information, see Chapter 49, The GAMPL Procedure.

GENMOD

fits generalized linear models. It is especially suited for responses that have discrete outcomes, and it performs logistic regression and Poisson regression in addition to fitting generalized estimating equations for repeated measures data. PROC GENMOD supports CLASS variables and provides Bayesian analysis capabilities. For more information, see Chapter 9, Introduction to Categorical Data Analysis Procedures, and Chapter 51, The GENMOD Procedure.

GLIMMIX

uses likelihood-based methods to fit generalized linear mixed models. It can perform simple, multiple, polynomial, and weighted regression, in addition to many other analyses. It can fit linear mixed models, which have random effects, and models that do not have random effects. PROC GLIMMIX supports CLASS variables. For more information, see Chapter 52, The GLIMMIX Procedure.

GLM

uses the method of least squares to fit general linear models. It can perform simple, multiple, polynomial, and weighted regression in addition to many other analyses. It has many of the same input/output capabilities as PROC REG, but it does not provide as many diagnostic tools or allow interactive changes in the model or data. PROC GLM supports CLASS variables. For more information, see Chapter 5, Introduction to Analysis of Variance Procedures, and Chapter 53, The GLM Procedure.

GLMSELECT

performs variable selection in the framework of general linear models. It supports CLASS variables (like PROC GLM) and model selection (like PROC REG). A variety of model selection methods are available, including forward, backward, stepwise, LASSO, and least angle regression. PROC GLMSELECT provides a variety of selection and stopping criteria. For more information, see Chapter 56, The GLMSELECT Procedure.

HPGENSELECT

is a high-performance procedure that provides model fitting and model building for generalized linear models. It fits models for standard distributions in the exponential family, such as the normal, Poisson, and Tweedie distributions. In addition, it fits multinomial models for ordinal and nominal responses, and it fits zero-inflated Poisson and negative binomial models for count data. For all these models, PROC HPGENSELECT provides forward, backward, and stepwise variable selection. It supports CLASS variables. For more information, see Chapter 59, The HPGENSELECT Procedure.

HPLOGISTIC

is a high-performance procedure that fits logistic regression models for binary, binomial, and multinomial data. It fits logistic regression models in the broader sense; it permits several link functions and can handle ordinal and nominal data that have more than two response categories (multinomial data). PROC HPLOGISTIC supports CLASS variables. For more information, see Chapter 61, The HPLOGISTIC Procedure.

HPNLMOD

is a high-performance procedure that uses either nonlinear least squares or maximum likelihood to fit nonlinear regression models. It enables you to specify the model by using SAS programming statements, which give you greater flexibility in modeling the relationship between the response variable and independent (regressor) variables than SAS procedures that use a more structured MODEL statement. For more information, see Chapter 63, The HPNLMOD Procedure.

HPQUANTSELECT

is a high-performance procedure that fits quantile regression models and performs effect selection. Quantile regression is a systematic statistical methodology for modeling conditional quantile functions of a response variable on explanatory covariate effects. PROC HPQUANTSELECT supports CLASS variables. For more information, see Chapter 66, The HPQUANTSELECT Procedure.

HPREG

is a high-performance procedure that fits and performs model selection for ordinary linear least squares models. The supported models are standard independently and identically distributed (iid) general linear models, which can contain main effects that consist of both continuous and classification variables and the interaction effects of these variables. PROC HPREG offers extensive capabilities for customizing the model selection by using a wide variety of selection and stopping criteria, from traditional and computationally efficient significance-level-based criteria to more computationally intensive validation-based criteria. It also provides a variety of regression diagnostics that are conditional on the selected model. PROC HPREG supports CLASS variables. For more information, see Chapter 67, The HPREG Procedure.

LIFEREG

fits parametric models to failure-time data that might be right-censored. These types of models are commonly used in survival analysis. PROC LIFEREG supports CLASS variables and provides Bayesian analysis capabilities. For more information, see Chapter 14, Introduction to Survival Analysis Procedures, and Chapter 76, The LIFEREG Procedure.

LOESS

uses a local regression method to fit nonparametric models. It is suitable for modeling regression surfaces in which the underlying parametric form is unknown and for which robustness in the presence of outliers is required. For more information, see Chapter 78, The LOESS Procedure.

LOGISTIC

fits logistic models for binomial and ordinal outcomes. It provides a wide variety of model selection methods and computes numerous regression diagnostics. PROC LOGISTIC supports CLASS variables. For more information, see Chapter 9, Introduction to Categorical Data Analysis Procedures, and Chapter 79, The LOGISTIC Procedure.

MIXED

uses likelihood-based techniques to fit linear mixed models. It can perform simple, multiple, polynomial, and weighted regression, in addition to many other analyses. It can fit linear mixed models, which have random effects, and models that do not have random effects. PROC MIXED supports CLASS variables. For more information, see Chapter 84, The MIXED Procedure.

NLIN

uses the method of nonlinear least squares to fit general nonlinear regression models. Several different iterative methods are available. For more information, see Chapter 88, The NLIN Procedure.

NLMIXED

uses the method of maximum likelihood to fit general nonlinear mixed regression models. PROC NLMIXED enables you to specify a custom objective function for parameter estimation and to fit models with or without random effects. For more information, see Chapter 89, The NLMIXED Procedure.

ORTHOREG

uses the Gentleman-Givens computational method to perform regression. For ill-conditioned data, PROC ORTHOREG can produce more-accurate parameter estimates than procedures such as PROC GLM and PROC REG. PROC ORTHOREG supports CLASS variables. For more information, see Chapter 91, The ORTHOREG Procedure.

PHREG

fits Cox proportional hazards regression models to survival data. PROC PHREG supports CLASS variables and provides Bayesian analysis capabilities. For more information, see Chapter 14, Introduction to Survival Analysis Procedures, and Chapter 92, The PHREG Procedure.

PLS

performs partial least squares regression, principal component regression, and reduced rank regression, along with cross validation for the number of components. PROC PLS supports CLASS variables. For more information, see Chapter 95, The PLS Procedure.

PROBIT

performs probit regression in addition to logistic regression and ordinal logistic regression. PROC PROBIT is useful when the dependent variable is either dichotomous or polychotomous and the independent variables are continuous. PROC PROBIT supports CLASS variables. For more information, see Chapter 100, The PROBIT Procedure.

QUANTREG

uses quantile regression to model the effects of covariates on the conditional quantiles of a response variable. PROC QUANTREG supports CLASS variables. For more information, see Chapter 103, The QUANTREG Procedure.

QUANTSELECT

provides variable selection for quantile regression models. Selection methods include forward, backward, stepwise, and LASSO. The procedure provides a variety of selection and stopping criteria. PROC QUANTSELECT supports CLASS variables. For more information, see Chapter 104, The QUANTSELECT Procedure.

REG

performs linear regression with many diagnostic capabilities. PROC REG produces fit, residual, and diagnostic plots; heat maps; and many other types of graphs. PROC REG enables you to select models by using any one of nine methods, and you can interactively change both the regression model and the data that are used to fit the model. For more information, see Chapter 105, The REG Procedure.

ROBUSTREG

uses Huber M estimation and high breakdown value estimation to perform robust regression. PROC ROBUSTREG is suitable for detecting outliers and providing resistant (stable) results in the presence of outliers. PROC ROBUSTREG supports CLASS variables. For more information, see Chapter 107, The ROBUSTREG Procedure.

RSREG

builds quadratic response-surface regression models. PROC RSREG analyzes the fitted response surface to determine the factor levels of optimum response and performs a ridge analysis to search for the region of optimum response. For more information, see Chapter 108, The RSREG Procedure.

SURVEYLOGISTIC

uses the method of maximum likelihood to fit logistic models for binary and ordinal outcomes to survey data. PROC SURVEYLOGISTIC supports CLASS variables. For more information, see Chapter 15, Introduction to Survey Sampling and Analysis Procedures, and Chapter 120, The SURVEYLOGISTIC Procedure.

SURVEYPHREG

fits proportional hazards models for survey data by maximizing a partial pseudo-likelihood function that incorporates the sampling weights. The SURVEYPHREG procedure provides design-based variance estimates, confidence intervals, and tests for the estimated proportional hazards regression coefficients. PROC SURVEYPHREG supports CLASS variables. For more information, see Chapter 15, Introduction to Survey Sampling and Analysis Procedures, Chapter 14, Introduction to Survival Analysis Procedures, and Chapter 122, The SURVEYPHREG Procedure.

SURVEYREG

uses elementwise regression to fit linear regression models to survey data by generalized least squares. PROC SURVEYREG supports CLASS variables. For more information, see Chapter 15, Introduction to Survey Sampling and Analysis Procedures, and Chapter 123, The SURVEYREG Procedure.

TPSPLINE

uses penalized least squares to fit nonparametric regression models. PROC TPSPLINE makes no assumptions of a parametric form for the model. For more information, see Chapter 125, The TPSPLINE Procedure.

TRANSREG

fits univariate and multivariate linear models, optionally with spline, Box-Cox, and other nonlinear transformations. Models include regression and ANOVA, conjoint analysis, preference mapping, redundancy analysis, canonical correlation, and penalized B-spline regression. PROC TRANSREG supports CLASS variables. For more information, see Chapter 126, The TRANSREG Procedure.

Several SAS/ETS procedures also perform regression. The following procedures are documented in the SAS/ETS User's Guide:

ARIMA

uses autoregressive moving-average errors to perform multiple regression analysis. For more information, see Chapter 7, The ARIMA Procedure (SAS/ETS User's Guide).

AUTOREG

implements regression models that use time series data in which the errors are autocorrelated. For more information, see Chapter 8, The AUTOREG Procedure (SAS/ETS User's Guide).

COUNTREG

analyzes regression models in which the dependent variable takes nonnegative integer or count values. For more information, see Chapter 11, The COUNTREG Procedure (SAS/ETS User's Guide).

MDC

fits conditional logit, mixed logit, heteroscedastic extreme value, nested logit, and multinomial probit models to discrete choice data. For more information, see Chapter 23, The MDC Procedure (SAS/ETS User's Guide).

MODEL

handles nonlinear simultaneous systems of equations, such as econometric models. For more information, see Chapter 24, The MODEL Procedure (SAS/ETS User's Guide).

PANEL

analyzes a class of linear econometric models that commonly arise when time series and cross-sectional data are combined. For more information, see Chapter 25, The PANEL Procedure (SAS/ETS User's Guide).

PDLREG

fits polynomial distributed lag regression models. For more information, see Chapter 26, The PDLREG Procedure (SAS/ETS User's Guide).

QLIM

analyzes limited dependent variable models in which dependent variables take discrete values or are observed only in a limited range of values. For more information, see Chapter 27, The QLIM Procedure (SAS/ETS User's Guide).

SYSLIN

handles linear simultaneous systems of equations, such as econometric models. For more information, see Chapter 35, The SYSLIN Procedure (SAS/ETS User's Guide).

VARMAX

performs multiple regression analysis for multivariate time series dependent variables by using current and past vectors of dependent and independent variables as predictors, with vector autoregressive moving-average errors, and with modeling of time-varying heteroscedasticity. For more information, see Chapter 42, The VARMAX Procedure (SAS/ETS User's Guide).

Last updated: December 09, 2022