The code snippets in this section resemble those for most other SAS/ACCESS interfaces.
This snippet shows a list of available Spark tables.
proc datasets lib=spk; quit;
Here is the metadata for the mytab Spark table.
proc contents data=spk.mytab; quit;
This snippet extracts mytab data into SAS.
data work.a;
set spk.mytab;
run;
This extracts a subset of the mytab rows and columns into SAS. Subsetting the rows (with a WHERE statement, for example) can help avoid extracting too much data into SAS.
data work.a;
set spk.mytab (keep=col1 col2);
where col2=10;
run;
This example uses the DBSASTYPE= data set option to load Spark textual dates, timestamps, and times into the corresponding SAS DATE, DATETIME, and TIME formats. The first step reads in a SAS character string to display the data and make clear what occurs in successive steps.
data;
set spk.testSparkDate;
put dt;
run;
2011-10-17 2009-07-30 12:58:59 11:30:01
data;
set spk.testSparkDate(dbsastype=(dt='date'));
put dt;
run;
17OCT2011 30JUL2009 .
data;
set spk.testSparkDate(dbsastype=(dt='datetime'));
put dt;
run;
17OCT2011:00:00:00 30JUL2009:12:58:59 .
data;
set spk.testSparkDate(dbsastype=(dt='time'));
put dt;
run;
. 12:58:59 11:30:01
This code uses SAS SQL to access a Spark table.
proc sql;
create table work.a as select * from spk.newtab;
quit;
SAS data is then loaded into Spark.
data spk.newtab2;
set work.a;
run;
Use implicit pass-through SQL to extract only 10 rows from the Newtab table and load the work SAS data set with the results.
proc sql;
connect to spark (<connection-options>);
create table work.a as
select * from connection to spark (select * from newtab limit 10);
quit;
Use the DBCREATE_TABLE_OPTS value PARTITIONED BY (column data-type) STORED AS SEQUENCEFILE.
libname spk SPARK <connection-options>;
data spk.part_tab (DBCREATE_TABLE_OPTS="PARTITIONED BY (s2 int, s3 string)
STORED AS SEQUENCEFILE");
set work.part_tab;
run;
You can view additional sample programs for this SAS/ACCESS interface in the SAS Software GitHub repository.