Bulk loading with the Impala engine can be accomplished in the following two ways. For details, see Configuration Details.
Here is how the Impala engine creates table data using the bulk-loading process:
Here are the Impala bulk-load data set options. The BULKLOAD= data set option is required for bulk loading, and all others are optional. For more information, see Data Set Options.
Java JAR files, HDFS host-specific configuration information, or both are required for bulk loading to HDFS. The host that is required for HDFS when you are using Java or WebHDFS might differ from the Impala host. For example, the host might be another machine on the same cluster. Depending on your configuration, you must specify this information when you are bulk loading data to Impala.
Here is what to specify if you use WebHDFS to upload table data:
SAS_HADOOP_RESTFUL=1 (required).SAS_HADOOP_CONFIG_PATH=<configuration-directory>
(contains the hdfs-site.xml file for the specific host). As an alternative, you
can use the BL_HOST= data set option to specify the HDFS host name. Using
SAS_HADOOP_CONFIG_PATH= is the preferred solution.Here is what to specify if you use Java to upload table data:
SAS_HADOOP_RESTFUL=0 (optional).SAS_HADOOP_JAR_PATH=<jar-directory>.
For instructions, see SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS.SAS_HADOOP_CONFIG_PATH=<configuration-directory>
(contains the hdfs-site.xml file for the specific host). As an alternative, you
can use the BL_HOST= data set option to specify the HDFS host name. Using
SAS_HADOOP_CONFIG_PATH= is the preferred solution.For more information about SAS_HADOOP_RESTFUL, SAS_HADOOP_JAR_PATH, and SAS_HADOOP_CONFIG_PATH environment variables, see SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS. Here are ways that you can specify them:
SAS ... —SET SAS_HADOOP_RESTFUL 1option set=SAS_HADOOP_RESTFUL 1;This example shows how you can use a SAS data set, SASFLT.FLT98, to create and load an Impala table, FLIGHTS98.
libname sasflt 'SAS-data-library';
libname mydblib impala host=mysrv1
db=users user=myusr1 password=mypwd1;
proc sql;
create table mydblib.flights98
(BULKLOAD=YES
BL_DATAFILE='/tmp/mytable.dat'
BL_HOST='192.168.x.x'
BL_PORT=50070)
as select * from sasflt.flt98;
quit;
This example shows how you can append the SAS data set, SASFLT.FLT98, to the existing Impala table, FLIGHTS98. In this example, the HDFS_PRINCIPAL data set option is specified as well. You specify the HDFS_PRINCIPAL= data set option when you configure HDFS to allow Kerberos authentication. Rather than deleting the data file, BL_DELETE_DATAFILE=NO causes the engine to leave it after the load has completed.
proc append base=mydblib.flights98
(BULKLOAD=yes
BL_DATAFILE='/tmp/mytable.dat'
BL_DELETE_DATAFILE=no
HDFS_PRINCIPAL='hdfs/hdfs_host.example.com@test.example.com'
BL_HOST='192.168.x.x'
BL_PORT=50070)
data=sasflt.flt98;
run;
This example shows how to use a SAS data set, SASFLT.FLT98, to create and load an Impala table, FLIGHTS98, using WebHDFS and configuration files.
option set=SAS_HADOOP_RESTFUL 1;
option set=SAS_HADOOP_CONFIG_PATH "/configs/myhdfshost";
/* This path should point to the directory that contains */
/* the hdfs-site.xml file for the cluster that hosts the */
/* 'mysrv1' Impala host */
libname sasflt 'SAS-data-library';
libname mydblib impala host=mysrv1
db=users user=myusr1 password=mypwd1;
proc sql;
create table mydblib.flights98
(BULKLOAD=YES
BL_DATAFILE='/tmp/mytable.dat'
/* no BL_HOST or BL_PORT */
as select * from sasflt.flt98;
quit;