The GLM Procedure

PROC GLM Statement

  • PROC GLM <options>;

The PROC GLM statement invokes the GLM procedure. Table 4 summarizes the options available in the PROC GLM statement.

Table 4: PROC GLM Statement Options

Option Description
ALPHA= Specifies the level of significance for confidence intervals
DATA= Names the SAS data set used by the GLM procedure
MANOVA Requests the multivariate mode of eliminating observations with missing values
MULTIPASS Requests that the input data set be reread when necessary, instead of using a utility file
NAMELEN= Specifies the length of effect names
NOPRINT Suppresses the normal display of results
ORDER= Specifies the order in which to sort classification variables
OUTSTAT= Names an output data set for information and statistics on each model effect
PLOTS Controls the plots produced through ODS Graphics


You can specify the following options in the PROC GLM statement.

ALPHA=p

specifies the level of significance p for 100 left-parenthesis 1 minus p right-parenthesis% confidence intervals. The value must be between 0 and 1; the default value of p = 0.05 results in 95% intervals. This value is used as the default confidence level for limits computed by the following options.

Statement Options
LSMEANS CL
MEANS CLM CLDIFF
MODEL CLI CLM CLPARM
OUTPUT UCL= LCL= UCLM= LCLM=

You can override the default in each of these cases by specifying the ALPHA= option for each statement individually.

DATA=SAS-data-set

names the SAS data set used by the GLM procedure. By default, PROC GLM uses the most recently created SAS data set.

MANOVA

requests the multivariate mode of eliminating observations with missing values. If any of the dependent variables have missing values, the procedure eliminates that observation from the analysis. The MANOVA option is useful if you use PROC GLM in interactive mode and plan to perform a multivariate analysis.

MULTIPASS

requests that PROC GLM reread the input data set when necessary, instead of writing the necessary values of dependent variables to a utility file. This option decreases disk space usage at the expense of increased execution times, and is useful only in rare situations where disk space is at an absolute premium.

NAMELEN=n

specifies the length of effect names in tables and output data sets to be n characters long, where n is a value between 20 and 200 characters. The default length is 20 characters.

NOPRINT

suppresses the normal display of results. The NOPRINT option is useful when you want only to create one or more output data sets with the procedure. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 23, Using the Output Delivery System, for more information.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the sort order for the levels of the classification variables (which are specified in the CLASS statement).

This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option can be useful when you specify the CONTRAST or ESTIMATE statement.

This option applies to the levels for all classification variables, except when you use the (default) ORDER=FORMATTED option with numeric classification variables that have no explicit format. In that case, the levels of such variables are ordered by their internal value.

The ORDER= option can take the following values:

Value of ORDER= Levels Sorted By
DATA Order of appearance in the input data set
FORMATTED External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value
FREQ Descending frequency count; levels with the most observations come first in the order
INTERNAL Unformatted value

By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine-dependent.

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in the "Grouping Data" section of SAS Programmers Guide: Essentials.

OUTSTAT=SAS-data-set

names an output data set that contains sums of squares, degrees of freedom, F statistics, and probability levels for each effect in the model, as well as for each CONTRAST that uses the overall residual or error mean square (MSE) as the denominator in constructing the F statistic. If you use the CANONICAL option in the MANOVA statement and do not use an M= specification in the MANOVA statement, the data set also contains results of the canonical analysis.

See the section Output Data Sets for more information.

PLOTS <(global-plot-options)> <=plot-request <(options)>>
PLOTS <(global-plot-options)> <=(plot-request <(options)> <…plot-request <(options)>>)>

controls the plots produced through ODS Graphics. When you specify only one plot-request, you can omit the parentheses from around the plot-request. For example:

   PLOTS=NONE
   PLOTS=(DIAGNOSTICS RESIDUALS)
   PLOTS(UNPACK)=RESIDUALS
   PLOT=MEANPLOT(CLBAND)

ODS Graphics must be enabled before plots can be requested. For example:

ods graphics on;
proc glm data=iron;
   model loss=fe fe*fe;
run;
ods graphics off;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 24, Statistical Graphics Using ODS.

If ODS Graphics is enabled but you do not specify the PLOTS= option, then PROC GLM produces a default set of plots, which might be different for different models, as discussed in the following.

  • If you specify a one-way analysis of variance model that has just one CLASS variable, the GLM procedure produces a grouped box plot of the response values versus the CLASS levels. For an example of the box plot, see the section One-Way Layout with Means Comparisons in Chapter 29, The ANOVA Procedure.

  • If you specify a two-way analysis of variance model that has just two CLASS variables, the GLM procedure produces an interaction plot of the response values, where horizontal position represents one CLASS variable and marker style represents the other, and where predicted response values connected by lines represent the two-way analysis. For an example of the interaction plot, see the section PROC GLM for Unbalanced ANOVA.

  • If you specify a model that has two CLASS variables, and one variable is nested within the other, then the GLM procedure produces a nested box plot of the response values, where horizontal position represents one CLASS variable nested within the other CLASS variable.

  • If you specify a model that has a single continuous predictor, the GLM procedure produces a fit plot of the response values versus the covariate values, where a curve represents the fitted relationship and a band represents the confidence limits for individual mean values. For an example of the fit plot, see the section PROC GLM for Quadratic Least Squares Regression.

  • If you specify a model that has two continuous predictors and no CLASS variables, the GLM procedure produces a contour fit plot, overlaying a scatter plot of the data and a contour plot of the predicted surface.

  • If you specify an analysis of covariance model that has one or two CLASS variables and one continuous variable, the GLM procedure produces an analysis of covariance plot of the response values versus the covariate values, where lines represent the fitted relationship within each classification level. For an example of the analysis of covariance plot, see Example 53.4.

  • If you specify an LSMEANS statement with the PDIFF option, the GLM procedure produces a plot appropriate for the type of LS-means comparison. For PDIFF=ALL (which is the default if you specify only PDIFF), the procedure produces a diffogram, which displays all pairwise LS-means differences and their significance. The display is also known as a "mean-mean scatter plot" (Hsu 1996). For PDIFF=CONTROL, the procedure produces a display of each noncontrol LS-mean compared to the control LS-mean, with two-sided confidence intervals for the comparison. For PDIFF=CONTROLL and PDIFF=CONTROLU a similar display is produced, but with one-sided confidence intervals. Finally, for the PDIFF=ANOM option, the procedure produces an analysis-of-means plot, which compares each LS-mean to the average LS-mean.

  • If you specify a MEANS statement, the GLM procedure produces a grouped box plot of the response values versus the effect for which means are being calculated.

The global-plot-options include the following:

MAXPOINTS=NONE | number

suppresses plots that contain elements that require processing of more than number points. The default is MAXPOINTS=5000. This limit is ignored if you specify MAXPOINTS=NONE.

ONLY

suppresses the default plots. Only plots that you specifically request are displayed.

UNPACKPANEL
UNPACK

suppresses paneling. By default, multiple plots can appear in some output panels. Specify UNPACKPANEL to get each plot in a separate panel. You can specify PLOTS(UNPACKPANEL) to just unpack the default plots. You can also specify UNPACKPANEL as a suboption with DIAGNOSTICS and RESIDUALS.

The following individual plots and plot options are available. If you specify only one plot, then you can omit the parentheses.

ALL

produces all appropriate plots. You can specify other options with ALL; for example, to request all plots and unpack just the residuals, specify: PLOTS=(ALL RESIDUALS(UNPACK)).

ANCOVAPLOT<(CLM CLI LIMITS)>

modifies the analysis of covariance plot produced by default when you have an analysis of covariance model, with one or two CLASS variables and one continuous variable. By default the plot does not show confidence limits around the predicted values. The PLOTS=ANCOVAPLOT(CLM) option adds limits for the expected predicted values, and PLOTS=ANCOVAPLOT(CLI) adds limits for new predictions. Use PLOTS=ANCOVAPLOT(LIMITS) to add both kinds of limits.

ANOMPLOT

requests an analysis-of-means display, in which least squares means are compared against an average least squares mean (Ott 1967; Nelson 1982, 1991, 1993). LS-mean ANOM plots are produced only if you also specify PDIFF=ANOM or ADJUST=NELSON in the LSMEANS statement, and in this case they are produced by default.

BOXPLOT<(NPANELPOS=n)>

modifies the plot produced by default for the model effect in a one-way analysis of variance model, or for an effect specified in the MEANS statement. Suppose the effect has m levels. By default, or if you specify PLOTS=BOXPLOT(NPANELPOS=0), all m levels of the effect are displayed in a single plot. Specifying a nonzero value of n will result in P panels, where P is the integer part of m slash n plus 1. If n greater-than 0, then the levels will be approximately balanced across the P panels; whereas if n less-than 0, precisely StartAbsoluteValue n EndAbsoluteValue levels will be displayed on each panel except possibly the last.

CONTOURFIT<(OBS=obs-options)>

modifies the contour fit plot produced by default when you have a model involving only two continuous predictors. The plot displays a contour plot of the predicted surface overlaid with a scatter plot of the observed data. You can use the following obs-options to control how the observations are displayed:

GRADIENT

specifies that observations are displayed as circles colored by the observed response. The same color gradient is used to display the fitted surface and the observations. Observations where the predicted response is close to the observed response have similar colors: the greater the contrast between the color of an observation and the surface, the larger the residual is at that point.

NONE

suppresses the observations.

OUTLINE

specifies that observations are displayed as circles with a border but with a completely transparent fill.

OUTLINEGRADIENT

is the same as OBS=GRADIENT except that a border is shown around each observation. This option is useful to identify the location of observations where the residuals are small, since at these points the color of the observations and the color of the surface are indistinguishable. OBS=OUTLINEGRADIENT is the default if you do not specify any obs-options.

CONTROLPLOT

requests a display in which least squares means are compared against a reference level. LS-mean control plots are produced only when you specify PDIFF=CONTROL or ADJUST=DUNNETT in the LSMEANS statement, and in this case they are produced by default.

DIAGNOSTICS<(LABEL UNPACK)>

requests that a panel of summary diagnostics for the fit be displayed. The panel displays scatter plots of residuals, studentized residuals, and observed responses by predicted values; studentized residuals by leverage; Cook’s D by observation; a Q-Q plot of residuals; a residual histogram; and a residual-fit spread plot. The LABEL option displays labels on observations satisfying RSTUDENT greater-than 2, LEVERAGE greater-than 2 p slash n, and on the Cook’s D plot, COOKSD greater-than 4 slash n, where n is the number of observations used in fitting the model, and p is the number of parameters in the model. The label is the first ID variable if the ID statement is specified; otherwise, it is the observation number. The UNPACK option unpanels the diagnostic display and produces the series of individual plots that form the paneled display.

DIFFPLOT<(ABS NOABS CENTER NOLINES)>

modifies the plot produced by an LSMEANS statement with the PDIFF=ALL option (or just PDIFF, since ALL is the default argument). The ABS and NOABS options determine the positioning of the line segments in the plot. When the ABS option is in effect, and this is the default, all line segments are shown on the same side of the reference line. The NOABS option separates comparisons according to the sign of the difference. The CENTER option marks the center point for each comparison. This point corresponds to the intersection of two least squares means. The NOLINES option suppresses the display of the line segments that represent the confidence bounds for the differences of the least squares means. The NOLINES option implies the CENTER option. The default is to draw line segments in the upper portion of the plot area without marking the center point.

FITPLOT<(NOCLM NOCLI NOLIMITS)>

modifies the fit plot produced by default when you have a model with a single continuous predictor. By default the plot includes confidence limits for both the expected predicted values and individual new predictions. The PLOTS=FITPLOT(NOCLM) option removes the limits on the expected values and the PLOTS=FITPLOT(NOCLI) option removes the limits on new predictions. The PLOTS=FITPLOT(NOLIMITS) option removes both kinds of confidence limits.

INTPLOT<(CLM CLI LIMITS)>

modifies the interaction plot that is produced by default when you have a two-way analysis of variance model that has just two CLASS variables. By default, the plot does not show confidence limits around the predicted values. The PLOTS=INTPLOT(CLM) option adds limits for the expected predicted values and PLOTS=INTPLOT(CLI) adds limits for new predictions. Use PLOTS=INTPLOT(LIMITS) to add both kinds of limits.

LINESPLOT<(WSCALE=wfactor HSCALE=hfactor)>

modifies the dimensions of the means comparison plot that is produced by the LINES option in the LSMEANS and MEANS statements. The default dimensions of the plot vary according to aspects such as the number of groups, the number of CLASS variables in the effect, and the number of parallel lines needed to represent comparisons. You can change the defaults by specifying the following options:

HSCALE=hfactor

scales the default height of the plot by hfactor, which must be a positive number. For example, specifying HSCALE=2 makes the plot twice as high as it would be by default.

WSCALE=wfactor

scales the default width of the plot by wfactor, which must be a positive number. For example, specifying WSCALE=2 makes the plot twice as wide as it would be by default.

MEANPLOT<(CL CLBAND CONNECT ASCENDING DESCENDING)>

modifies the grouped box plot that is produced by an LSMEANS statement. Upper and lower confidence limits are plotted when the CL option is specified. When the CLBAND option is in effect, confidence limits are shown as bands and the means are connected. By default, means are not joined by lines. You can achieve that effect by specifying the CONNECT option. Means are displayed in the same order in which they appear in the "Means" table. You can change that order for plotting by specifying the ASCENDING and DESCENDING options.

NESTPLOT<(BOXWIDTH=value CLUSTERWIDTH=value LABELOUTLIER)>

modifies the box plot that is produced by default when you have two CLASS variables and one is nested within the other. By default, the plot does not label the outliers. The PLOTS=NESTPLOT(BOXWIDTH=) option controls the width of the boxes, and the PLOTS=NESTPLOT(CLUSTERWIDTH=) option controls how tightly the boxes are clustered together. You can specify the BOXWIDTH= and CLUSTERWIDTH= values as numbers between 0 and 1; the defaults are 0.8.

NONE

suppresses all graphics.

RESIDUALS<(SMOOTH UNPACK)>

displays scatter plots of the residuals against each continuous covariate. The SMOOTH option overlays a Loess smooth on each residual plot. Note that if a WEIGHT variable is specified, then it is not used to weight the smoother. For more information, see Chapter 78, The LOESS Procedure. The UNPACK option unpanels the residual display and produces a series of individual plots that form the paneled display.

Last updated: December 09, 2022