When your data contains null values or when internal processing generates intermediate data sets that contain null values, you might receive different results depending on whether SAS or the DBMS does the processing. Although in many cases this does not present a problem, it is important to understand how these differences occur.
Most relational database systems have a special value called null, which means an absence of information and is analogous to a SAS missing value. SAS/ACCESS translates SAS missing values to DBMS null values when creating DBMS tables from within SAS. Conversely, SAS/ACCESS translates DBMS null values to SAS missing values when reading DBMS data into SAS.
However, there is an important difference in the behavior of DBMS null values and SAS missing values.
This means that SAS and the DBMS interpret null values differently, which has significant implications when SAS/ACCESS passes queries to a DBMS for processing. This can be an issue in these situations:
For example, create a simple data set that consists of one observation and one variable.
libname myhadlib hadoop user=myusr1 password=mypwd1;
data myhadlib.table;
x=.; /*create a missing value */
run;
Then print the data set using a WHERE clause, which SAS/ACCESS passes to the DBMS for processing.
proc print data=myhadlib.table;
where x<0;
run;
The log indicates that the WHERE clause selected no observations. Hadoop interprets the missing value as the absence of data and does not evaluate it with the less-than (<) comparison operator.
When there is the potential for inconsistency, consider using one of these strategies.
Use the NULLCHAR= data set option to specify how the DBMS interprets missing SAS character values when updating DBMS data or inserting rows into a DBMS table.
You can use the first of these strategies to force SAS to process the data in this example.
libname myhadlib hadoop user=myusr1 password=mypwd1
direct_sql=nowhere; /* forces SAS to process WHERE clauses */
data myhadlib.table;
x=.; /*create a missing value */
run;
You can then print the data set using a WHERE clause
proc print data=myhadlib.table;
where x<0;
run;
This time the log indicates that one observation was read from the data set because SAS evaluates the missing value as satisfying the less-than-zero condition in the WHERE clause.