Shared Concepts and Topics

Statistical Results Not Reliably Reproducible with Input CAS Tables

Inputting data from CAS tables does not guarantee a definite order of observations. When a procedure analyzes CAS data in real time, the order of observations that are loaded into memory might change each time you input the same CAS table. As a result, if a statistical analysis method depends on the order of the observations, an analysis that uses the same CAS table repeatedly might not reproduce exactly the same results. Hence, a nonreproducibility issue arises.

In some situations, the nonreproducibility issue is reflected by (mostly) minor numerical differences in the results from different runs, although all results are still considered to be statistically valid. Simulation-based methods, such as the bootstrap, are a typical example here. Random samples that are drawn by using simulation-based methods depend on the order of the observations in the input data set. Hence, numerical results based on analyzing these random samples are not reliably reproducible with input CAS tables. It is important to note that the nonreproducibility issue does not go away even if you specify a starting seed in a procedure’s SEED= option (whenever available).

Other situations where the nonreproducibility issue arises can be reflected by different qualitative solutions for repeated runs of an analysis with the same CAS table. For example, clustering observations are usually based on comparing distances among objects and clusters in a sequential manner. If there are ties in distances between objects or clusters in the analysis, the order of observations might then affect the results of the tie-breaking algorithm for clustering objects or clusters. Hence, the final clustering results might not be reliably reproducible with input CAS tables.

Whether nonreproducibility poses severe limitations for the use of CAS tables should be judged case by case. Depending on the purpose of a particular statistical analysis, the degree of the nonreproducibility, the trustworthiness of the statistical method, and even your tolerance of the stochastic nature of statistical results, you might or might not view nonreproducibility as a critical problem of statistical analyses or inferences. Nonetheless, SAS/STAT procedures issue a note to the log about potential nonreproducibility when you use CAS tables as input data.

For more information about the nonreproducibility issue with the use of CAS tables, see the section Statistical Results Not Reliably Reproducible with Input CAS Tables of Appendix C, Known Issues in Using CAS Tables with SAS/STAT Procedures.

Last updated: December 09, 2022