Autopartitioning for SAS/ACCESS Interface to HAWQ is a modulo (MOD) function method. For general information about this feature, see Autopartitioning Techniques in SAS/ACCESS.
SAS/ACCESS Interface to HAWQ places additional restrictions on the columns that you can use for the partitioning column during the autopartitioning phase. Here is how columns are partitioned.
If you select a nullable
column for autopartitioning, the OR<column-name>IS
NULL SQL statement is appended at the end of the SQL
code that is generated for the threaded Read. This ensures that any
possible NULL values are returned in the result set.
Autopartitioning does not select a column to be the partitioning column if it appears in a SAS WHERE clause. For example, this DATA step cannot use a threaded Read to retrieve the data because all numeric columns in the table are in the WHERE clause:
data work.locemp;
set trlib.MYEMPS;
where EMPNUM<=30 and ISTENURE=0 and
SALARY<=35000 and NUMCLASS>2;
run;
Although SAS/ACCESS Interface to HAWQ defaults to three threads when you use autopartitioning, do not specify a maximum number of threads for the threaded Read in DBSLICEPARM= LIBNAME option.
You might achieve the best possible performance when using threaded Reads by specifying the DBSLICE= data set option for HAWQ in your SAS operation. This is especially true if your HAWQ data is evenly distributed across multiple partitions in a HAWQ database system.
When you create a HAWQ
table using the HAWQ database partition model, you can specify the
partitioning key that you want to use by appending the PARTITION
BY<column-name> clause
to your CREATE TABLE statement. Here is how you can accomplish this
by using the DB_CREATE_TABLE_OPTS=LIBNAME option within the SAS environment.
/* Points to a triple-node server. */
libname mylib hawq server=hawq03 user=myuser pw=mypwd db=HAWQ;
DBCREATE_TABLE_OPTS='PARTITION BY(EMPNUM);
proc datasets library=mylib;
delete MYEMPS1;run;
data mylib.myemps(drop=morf whatstate
DBTYPE=(HIREDATE="date" SALARY="numeric(8,2)"
NUMCLASS="smallint" GENDER="char(1)" ISTENURE="numeric(1)" STATE="char(2)"
EMPNUM="int NOT NULL Primary Key"));
format HIREDATE mmddyy10.;
do EMPNUM=1 to 100;
morf=mod(EMPNUM,2)+1;
if(morf eq 1) then
GENDER='F';
else
GENDER='M';
SALARY=(ranuni(0)*5000);
HIREDATE=int(ranuni(13131)*3650);
whatstate=int(EMPNUM/5);
if(whatstate eq 1) then
STATE='FL';
if(whatstate eq 2) then
STATE='GA';
if(whatstate eq 3) then
STATE='SC';
if(whatstate eq 4) then
STATE='VA';
else
state='NC';
ISTENURE=mod(EMPNUM,2);
NUMCLASS=int(EMPNUM/5)+2;
output;
end;
run;
After the MYEMPS table is created on this three-node database, a third of the rows reside on each of the three nodes.
Using DBSLICE= works well when the table that you want to read is not stored in multiple partitions. It gives you flexibility in column selection. For example, if you know that the STATE column in your employee table contains only a few distinct values, you can modify your DBSLICE= option accordingly.
data work.locemp;
set mylib.MYEMPS (DBSLICE=("STATE='GA'"
"STATE='SC'" "STATE='VA'" "STATE='NC'"));
where EMPNUM<=30 and ISTENURE=0 and SALARY<=35000 and NUMCLASS>2;
run;