Shared Concepts and Topics

Observations Might Not Be Identified Reliably with Input CAS Tables

Because you cannot expect the order of observations to be persistent or definite when a SAS/STAT procedure reads data from a CAS table, the observation or case numbers, which reflect the order of observations, do not have a definite mapping to the observations in the CAS table. Thus, issues arise when a SAS/STAT procedure analyzes CAS data input and uses observation or case numbers to identify or label observations in the output results. For example, identifying outlying or influential observations is an important task in regression diagnostics. With a CAS table as the input data set, the observation numbers in the output might not be used to identify the observations correctly.

In general, using observation numbers to identify observations is neither a good nor a reliable practice. Using a CAS table as the input data set accentuates the identification problem. SAS/STAT procedures issue a note to the log to alert you to the identification problem when you use a CAS table as the input data set.

An effective strategy to address this problem is to specify identification variables in the ID statement, which many SAS/STAT procedures support. The output results might then label the observations by the values of the identification variables.

For more information about issues related to using observation numbers in output results of SAS/STAT procedures, see the section Observations Might Not Be Identified Reliably of Appendix C, Known Issues in Using CAS Tables with SAS/STAT Procedures.

Last updated: December 09, 2022