In regression diagnostics, you typically want to identify extreme observations in the input data. For example, the QUANTREG and ROBUSTREG procedures produce display output that shows outlying or influential observations. If you do not specify ID variables in the ID statement for these procedures, the numerical or graphical output labels these outlying or influential observations by using their observation numbers. However, these observation numbers might not correspond to the rows in the input CAS table. As a result, you cannot rely on these observation numbers to identify the observations.
Table 14 lists the outputs of SAS/STAT procedures that are known to exhibit the unreliable observation number issue when you use a CAS table as input data, which refer to the raw data input that you specify by using the DATA= option unless noted otherwise. In these cases, the procedures send a note to the log. Some procedures also recommend that you specify ID variables in the ID statement so that you can use these variable values to aid in identifying observations.
Table 14: Numerical and Graphical Output Displaying Unreliable Observation Numbers
| Procedures | Displayed Output |
|---|---|
| ANOVA, GEE, GENMOD, GLIMMIX, GLM, ORTHOREG, PROBIT, TTEST |
Box plots |
| CLUSTER | Clustering history, cluster results, and dendrogram |
| LOESS | Table of output statistics |
| LOGISTIC, MIXED, PLS, REG |
Box plots, diagnostic plots, fit plots, and other plots |
| NLIN | Leverage and local influence plots |
| PRINQUAL | Multidimensional preference plot |
| QUANTREG, ROBUSTREG |
Outlier and leverage diagnostics and the corresponding plots |
| QUANTREG | Conditional estimates when you specify the SHOWOBS option and input a CAS table for the TESTDATA= data set in the CONDDIST statement |
| TRANSREG | Preference mapping vector plot and preference mapping ideal point plot |
A related issue is the display of observation numbers in tooltips of graphical plots, such as box plots, diagnostic plots, needle plots, scatter plots, and so on. Again, you cannot rely on using these observation numbers to identify the observations or rows in the input CAS table. Fortunately, in most cases other information in the tooltips is available to label the observations in these plots. Some procedures also display the ID variable values in tooltips when you specify the ID statement.