Appendix C: Known Issues in Using CAS Tables with SAS/STAT Procedures

Using observation numbers to identify observations is not reliable

In regression diagnostics, you typically want to identify extreme observations in the input data. For example, the QUANTREG and ROBUSTREG procedures produce display output that shows outlying or influential observations. If you do not specify ID variables in the ID statement for these procedures, the numerical or graphical output labels these outlying or influential observations by using their observation numbers. However, these observation numbers might not correspond to the rows in the input CAS table. As a result, you cannot rely on these observation numbers to identify the observations.

Table 14 lists the outputs of SAS/STAT procedures that are known to exhibit the unreliable observation number issue when you use a CAS table as input data, which refer to the raw data input that you specify by using the DATA= option unless noted otherwise. In these cases, the procedures send a note to the log. Some procedures also recommend that you specify ID variables in the ID statement so that you can use these variable values to aid in identifying observations.

Table 14: Numerical and Graphical Output Displaying Unreliable Observation Numbers

Procedures Displayed Output
ANOVA,
GEE,
GENMOD,
GLIMMIX,
GLM,
ORTHOREG,
PROBIT,
TTEST
Box plots
CLUSTER Clustering history, cluster results, and dendrogram
LOESS Table of output statistics
LOGISTIC,
MIXED,
PLS,
REG
Box plots, diagnostic plots, fit plots, and other plots
NLIN Leverage and local influence plots
PRINQUAL Multidimensional preference plot
QUANTREG,
ROBUSTREG
Outlier and leverage diagnostics and the corresponding plots
QUANTREG Conditional estimates when you specify the SHOWOBS option and input a CAS table for the TESTDATA= data set in the CONDDIST statement
TRANSREG Preference mapping vector plot and preference mapping ideal point plot


A related issue is the display of observation numbers in tooltips of graphical plots, such as box plots, diagnostic plots, needle plots, scatter plots, and so on. Again, you cannot rely on using these observation numbers to identify the observations or rows in the input CAS table. Fortunately, in most cases other information in the tooltips is available to label the observations in these plots. Some procedures also display the ID variable values in tooltips when you specify the ID statement.

Last updated: December 09, 2022