Autopartitioning for SAS/ACCESS Interface to Greenplum is a modulo (MOD) function method. For general information about this feature, see Autopartitioning Techniques in SAS/ACCESS.
SAS/ACCESS Interface to Greenplum places additional restrictions on the columns that you can use for the partitioning column during the autopartitioning phase. Here is how columns are partitioned.
If you select a nullable
column for autopartitioning, the OR<column-name>IS
NULL SQL statement is appended at the end of the SQL
code that is generated for the threaded Read. This ensures that any
possible NULL values are returned in the result set.
Autopartitioning does not select a column to be the partitioning column if it appears in a SAS WHERE clause. For example, this DATA step cannot use a threaded Read to retrieve the data because all numeric columns in the table are in the WHERE clause:
data work.locemp;
set trlib.MYEMPS;
where EMPNUM<=30 and ISTENURE=0 and
SALARY<=35000 and NUMCLASS>2;
run;
Although SAS/ACCESS Interface to Greenplum defaults to three threads when you use autopartitioning, do not specify a maximum number of threads for the threaded Read in DBSLICEPARM= LIBNAME option.
You might achieve the best possible performance when using threaded Reads by specifying the DBSLICE= data set option for Greenplum in your SAS operation. This is especially true if your Greenplum data is evenly distributed across multiple partitions in a Greenplum database system.
When you create a Greenplum
table using the Greenplum database partition model, you can specify
the partitioning key that you want to use by appending the PARTITION
BY<column-name> clause
to your CREATE TABLE statement. Here is how you can accomplish this
by using the DB_CREATE_TABLE_OPTS=LIBNAME option within the SAS environment.
/* Points to a triple-node server. */
libname mylib greenplm server=gplum03 user=myuser pw=mypwd db=greenplum;
DBCREATE_TABLE_OPTS='PARTITION BY(EMPNUM);
proc datasets library=mylib;
delete MYEMPS1;run;
data mylib.myemps(drop=morf whatstate
DBTYPE=(HIREDATE="date" SALARY="numeric(8,2)"
NUMCLASS="smallint" GENDER="char(1)" ISTENURE="numeric(1)" STATE="char(2)"
EMPNUM="int NOT NULL Primary Key"));
format HIREDATE mmddyy10.;
do EMPNUM=1 to 100;
morf=mod(EMPNUM,2)+1;
if(morf eq 1) then
GENDER='F';
else
GENDER='M';
SALARY=(ranuni(0)*5000);
HIREDATE=int(ranuni(13131)*3650);
whatstate=int(EMPNUM/5);
if(whatstate eq 1) then
STATE='FL';
if(whatstate eq 2) then
STATE='GA';
if(whatstate eq 3) then
STATE='SC';
if(whatstate eq 4) then
STATE='VA';
else
state='NC';
ISTENURE=mod(EMPNUM,2);
NUMCLASS=int(EMPNUM/5)+2;
output;
end;
run;
After the MYEMPS table is created on this three-node database, a third of the rows reside on each of the three nodes.
Using DBSLICE= works well when the table that you want to read is not stored in multiple partitions. It gives you flexibility in column selection. For example, if you know that the STATE column in your employee table contains only a few distinct values, you can modify your DBSLICE= option accordingly.
data work.locemp;
set mylib.MYEMPS (DBSLICE=("STATE='GA'"
"STATE='SC'" "STATE='VA'" "STATE='NC'"));
where EMPNUM<=30 and ISTENURE=0 and SALARY<=35000 and NUMCLASS>2;
run;