LIBNAME Statement for the Hadoop Engine

Overview

This section describes the LIBNAME statement that SAS/ACCESS Interface to Hadoop supports. For general information about this feature, see LIBNAME Statement for Relational Databases.

IMPORTANT SAS 9.4M9 has new required locations for Hadoop JAR files and third-party JDBC driver files. Using the old location for the JAR files causes an error. The system ignores driver files installed in other locations. For more information about the requirements, see "Configuring SAS/ACCESS Interface to Hadoop" in SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS You must specify the location of the JAR files in the SAS_HADOOP_JAR_PATH= environment variable. These requirements do not apply to SAS Viya 3.5.
Note: The Cloudera JDBC Connector for Apache Hive is the reference driver for SAS/ACCESS Interface to Hadoop in SAS 9.4M9. You must obtain and install this driver, or a similar driver such as the Simba JDBC Data Connector for Apache Hive, to take advantage of the full functionality of SAS 9.4M9. The open source Apache Hive driver is the reference driver for SAS Viya 3.5.

Here is the LIBNAME statement syntax for accessing Hadoop:

For general information about the LIBNAME statement that is not specific to SAS/ACCESS, see LIBNAME Statement in SAS Global Statements: Reference.

JDBC Read Security

SAS/ACCESS can access Hadoop data through a JDBC connection to a HiveServer2 service. Depending on what release of Hive you have, Hive might not implement Read security. If Hive does not implement Read security, a successful connection from SAS allows Read access to all data that is accessible to the Hive service. SAS/ACCESS can connect to a Hive or HiveServer2 service that is unsecured, user name and password secured, or secured by Kerberos.

HDFS Write Security

SAS/ACCESS creates and appends to Hive tables using the HDFS service. SAS/ACCESS can connect to a Hive or HiveServer2 service that is unsecured, user name and password secured, or secured by Kerberos. Your HDFS connection needs Write access to the HDFS /tmp directory. After data is written to /tmp, a Hive LOAD command is issued on your JDBC connection to associate the data with a Hive table. Therefore, the JDBC Hive session also needs Write access to /tmp.

Note: If HDFS /tmp has enabled the sticky bit, the LOAD command can fail. To resolve this, either disable the /tmp sticky bit or use the HDFS_TEMPDIR option to specify an alternative HDFS directory for SAS/ACCESS to write data to.

HDFS Permission Requirements for Optimized Reads

To optimize big data Reads, SAS/ACCESS creates a temporary table in HDFS /tmp. This requires that the SAS JDBC connection have Write access to /tmp. The temporary table is read using HDFS, so the SAS HDFS connection needs Read access to the temporary table that is written to /tmp. Alternatively, use the HDFS_TEMPDIR option to specify an HDFS directory to use instead of /tmp.

Arguments

libref

specifies any SAS name that serves as an alias to associate SAS with a database, schema, server, or group of tables and views.

hadoop

specifies the SAS/ACCESS engine name for the Hadoop interface.

connection-options

provides connection information and controls how SAS manages timing and concurrence of the connection to Hadoop. All connection options are also valid in the CONNECT statement when you use the SQL pass-through facility (SQL procedure) to connect to your DBMS.

ALLOW_SELF_SIGNED_CERTS= NO | YES

enables or disables SSL wire encryption of client requests.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

CLASSPATH="directory-path"

This option behaves differently in SAS 9.4M9 than in SAS 9.4M8 and SAS Viya 3.5.

[SAS 9.4M9] specifies additional resources, including JDBC drivers, to add to the JDBC class path. If you specify CLASSPATH=, it is recommended to also specify the DRIVERCLASS= option in order to ensure that you select the correct driver.

  • On UNIX, this is the required location in SAS 9.4M9:
    /SASHOME/AccessClients/9.4/DataDrivers/jdbc

    or, if SASHOME is not set:

    /usr/local/SAS/AccessClients/9.4/DataDrivers/jdbc
  • On Windows, this is the required location:
    SASHOME\AccessClients\9.4\DataDrivers\jdbc

    or, if SASHOME is not set:

    C:\Progam Files\SASHome\AccessClients\9.4\DataDrivers\jdbc

[SAS 9.4M8 and SAS Viya 3.5] specifies the path to the JDBC JAR files. The value that you specify for CLASSPATH= overrides the value that is set for the SAS_ACCESS_CLASSPATH environment variable. You must set a value for the CLASSPATH= connection option or the SAS_ACCESS_CLASSPATH environment variable. The default value is the value that is set for the SAS_ACCESS_CLASSPATH environment variable.

Example:

libname x hadoop user=hive pwd='XXXXXX' 
classpath='/usr/local/SAS/AccessClients/9.4/DataDrivers/jdbc/alt' 
driverclass='class'
server=cdp71p3hive 
database=default;
DRIVERCLASS="value"

specifies the fully qualified class name of the JDBC driver to use for your database connection.

Alias: DRIVER=, CLASS=, HIVECLASS=

Note: The DRIVERCLASS= option is required when using a JDBC driver other than the Cloudera JDBC Connector for Apache Hive.

Examples:

  • Cloudera JDBC Connector for Apache Hive
    com.cloudera.hive.jdbc.HS2Driver
  • Open source Apache Hive driver
    driverclass="org.apache.hive.jdbc.HiveDriver"
HDFS_DATADIR='path'

when not in Hive mode, specifies the path to the Hadoop directory where SAS reads and writes data (for example, ‘/sas/hpa’). For details, see Accessing Data Independently from Hive in Base SAS Procedures Guide in Base SAS Procedures Guide.

Alias: HDFS_PERMDIR=

Use this option only when you are not using Hive or HiveServer2.

Note: This option is deprecated in SAS 9.4M9.

HDFS_METADIR='path'

specifies the path to an HDFS directory that contains XML-based table definitions, called SASHDMD descriptors. Through these descriptors, SAS then accesses the data using HDFS instead of Hive.

Use this option only when you are not using Hive or HiveServer2.

Note: This option is deprecated in SAS 9.4M9.

HDFS_TEMPDIR='path'

specifies the path to the HDFS directory where SAS reads and writes temporary data.

Default: HDFS_TEMPDIR='/tmp'

Use this option only when you are not using Hive or HiveServer2.

Note: This option is deprecated in SAS 9.4M9.

HIVE_PRINCIPAL='Hive-principal-name'

specifies the principal name of the Hive server in an environment that uses Kerberos.

Default: none

This argument is not required if hive-site.xml exists in the directory that is specified by the SAS_HADOOP_CONFIG_PATH environment variable.

Alias: HIVE_KERBEROS_PRINCIPAL=

HTTPPATH="path"

specifies the HTTP or HTTPS endpoint for building the JDBC URL connection string. This option is ignored when the URL= option is specified.

Default: none

Note: Support for this option starts in SAS 9.4M9.

MAPREDUCE_JOB_NAME="name"

specifies the MapReduce job name.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

MAPREDUCE_JOB_QUEUENAME="name"

specifies the MapReduce queue name.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

PASSWORD=<'>Hadoop-password<'>

specifies the Hadoop password that is associated with your user ID. If the password contains spaces or nonalphanumeric characters, you must enclose it in quotation marks. If you do not want to enter your Hadoop password in uncoded text on this statement, see PROC PWENCODE in the Base SAS Procedures Guide for a method to encode it.

Alias: PASS=, PWD=, PW=

PORT=port

specifies the port number to use to connect to the specified Hive service.

Alias: SERVICE=

Default: 10000

SCHEMA=Hive-schema

specifies the Hive schema.

Alias: DATABASE=, DB=

Default: default

SERVER=<'>Hadoop-server-name<'>

specifies the Hadoop server name that runs the Hive service. If the server name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.

Alias: HOST=

SSL_TRUST_STORE="path"

specifies the path to the SSL trust store.

When specified in the LIBNAME statement, this argument overrides the value stored in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

TEZ_QUEUE_NAME="name"

specifies an Apache Tez queue name in which to execute client requests.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

TRUST_STORE_PASSWORD="password"

specifies the password for the trust store.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

URI='jdbc:driver-name://driver-connection-options

specifies the full Hive JDBC connection string. The user’s input is used exactly as it is specified to connect to Hive, and SAS does not modify the connection string. The JDBC URL options in the connection string are driver-specific. For details about the driver options to include in the string, refer to your driver documentation. See Hadoop LIBNAME Statement Examples for examples of URL values.

Alias: URL=

Default: none

Note: The URI= option can be used to indicate which driver to use. Typically, the DRIVERCLASS= option takes precedence over the URI= option.
USER=<'>Hadoop-user-name<'>

specifies the user name for Read (JDBC) and Write (HDFS) operations. Do not use the USER= argument if your Hadoop cluster is secured by Kerberos.

Alias: UID=

ZOOKEEPER_NAMESPACE=namespace

specifies the ZooKeeper namespace.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

ZOOKEEPER_PORT=port-number

specifies the port number that ZooKeeper listens to for client requests.

Note: Support for this option starts in SAS 9.4M9.

ZOOKEEPER_QUORUM=

specifies the server nodes that are available to run client requests. Separate node names using commas.

When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.

Note: Support for this option starts in SAS 9.4M9.

LIBNAME-options

specifies how SAS processes DBMS objects. The following table describes the LIBNAME options for SAS/ACCESS Interface to Hadoop, with the applicable default values. This table also identifies LIBNAME options that are valid in the CONNECT statement in the SQL procedure. For details, see LIBNAME Options for Relational Databases.

SAS/ACCESS LIBNAME Options for Hadoop

Option

Default Value

Valid in CONNECT

ACCESS=

none

ANALYZE=

NO

BATCH_UPDATE=

YES if BULKLOAD=NO and your JDBC driver supports batch updates

NO otherwise

BL_FORMAT=

TEXT

BL_PORT=

8020

BULKLOAD=

YES

CONNECTION=

SHAREDREAD

DBCLIENT_MAX_BYTES=

matches the maximum number of bytes per single character of the SAS session encoding

DBCONINIT=

none

DBCONTERM=

none

DBCREATE_TABLE_ EXTERNAL=

NO

DBCREATE_TABLE_LOCATION=

none

DBCREATE_TABLE_OPTS=

none

DBGEN_NAME=

DBMS

DBLIBINIT=

none

DBLIBTERM=

none

DBMAX_TEXT=

1024

DBMAX_TEXT_TYPES=

LONG

DBMSTEMP=

NO

DBSASLABEL=

COMPAT

DEFER=

NO

DIRECT_SQL=

YES

DRIVER_TRACE=

none

DRIVER_TRACEFILE=

none

DRIVER_TRACEOPTIONS=

The trace log is overwritten and includes no time stamps or thread identification.

IGNORE_BASELINE=

NO

INSERTBUFF=

automatically calculated based on row length

LOGIN_TIMEOUT=

30

MULTI_DATASRC_OPT=

NONE

POST_STMT_OPTS=

none

PRESERVE_COL_NAMES=

NO

PRESERVE_TAB_NAMES=

NO

PROPERTIES=

none

QUERY_TIMEOUT=

0

READ_METHOD=

none

READBUFF=

automatically calculated based on row length

SCHEMA=

Hive schema default

SCRATCH_DB=

none

SPOOL=

YES

SQL_FUNCTIONS=

none

SQL_FUNCTIONS_COPY=

none

SUB_CHAR=

none

TEMP_CTAS

NO

TRANSCODE_FAIL=

ERROR

Hadoop LIBNAME Statement Examples

The following examples provide a basic introduction to common SAS/ACCESS to Hadoop configurations. These examples assume that the administrator has previously run the Hadoop tracer tool and copied the Hadoop jar and configuration files to the specified location. The tracer file gathers Java JAR files from the Hadoop deployment. The path to this directory is specified in the SAS_HADOOP_JAR_PATH environment variable. The XML configuration files are copied to the directory specified by the SAS_HADOOP_CONFIG_PATH environment variable. HDFS access is achieved using Java based APIs. In SAS 9.4M9, JAR files have a required location. For more information, see SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS.

Connecting to Hive with Dynamic Service Discovery (ZooKeeper)

A typical HADOOP LIBNAME statement specifies ZooKeeper information that provides for Dynamic Service Discovery. Apache ZooKeeper is a distributed coordination system that allows Hive to run in HiveServer2 high availability mode. The following processes occur in high availability mode:

  • When running multiple HiveServer2 instances, each server registers itself in ZooKeeper.
  • Clients (such as the Hive JDBC driver that is provided by SAS) query ZooKeeper to get the list of available HiveServer2 instances. This ensures automatic failover if one HiveServer2 instance goes down.

To enable Dynamic Service Discovery with SAS/ACCESS to Hadoop, specify the ZooKeeper options ZOOKEEPER_QUORUM=, ZOOKEEPER_NAMESPACE=, and ZOOKEEPER_PORT= in the LIBNAME statement. These options can be found in the hive-site.xml file for your Hadoop installation.


option set=SAS_HADOOP_JAR_PATH="/usr/local/SAS/AccessClients/9.4/DataDrivers/mysrv1_jars";
option set=SAS_HADOOP_CONFIG_PATH="/mysrv1_cfg";

libname x hadoop user='user-name' pwd='password'
 database='default'
 zookeeper_quorum='node1.company.com,node2.company.com,node3.company.com'
 zookeeper_namespace='hiveserver2'
 zookeeper_port=2181;

Use of the USER= and PWD= options specify that LDAP authentication is used. Use of the AUTHDOMAIN= option is preferred to avoid exposing sensitive credentials.

Connecting to Hive without Dynamic Service Discovery

To connect without using ZooKeeper, specify the SERVER= option. The SERVER= option specifies the HiveServer endpoint. When SERVER= is specified, HiveServer2 high availability will not be used by SAS/ACCESS to Hadoop. This option is appropriate when your Hadoop endpoint is not configured with ZooKeeper.


option set=SAS_HADOOP_JAR_PATH="/usr/local/SAS/AccessClients/9.4/DataDrivers/mysrv1_jars";
option set=SAS_HADOOP_CONFIG_PATH="/mysrv1_cfg";

libname x hadoop user='user-name' pwd='password'
 server='hive-server' database='default' port=10000;

Connecting to Hive Using Apache Knox

Apache Knox is a gateway service that provides secure access to Apache Hadoop clusters. It acts as a reverse proxy, enabling users and applications to interact with Hadoop services (such as HDFS, Hive, and YARN) without directly exposing them to external networks.

This type of connection has the advantage of not requiring any JAR files to be pulled from the Hadoop cluster, although Hadoop configuration files are still required. Here are important things to know about the connection:

  • The administrator has previously run the Hadoop tracer tool and copied the Hadoop configuration files to the locations specified by the SAS_HADOOP_CONFIG_PATH= environment variable.
  • When you use Knox, you must also define the KNOX_GATEWAY_URL= environment variable and you must set the SAS_HADOOP_RESTFUL environment variable to 1.
  • The JDBC driver URI= options shown in this example are specific to the Cloudera JDBC Connector for Apache Hive. For details about the driver options to include in the URI= string, refer to your driver documentation.

options set=SAS_HADOOP_RESTFUL=1;
options set=SAS_HADOOP_CONFIG_PATH="/mysrv1_cfg";
options set=KNOX_GATEWAY_URL="cloudera-URL";

libname x hadoop user=user-name pw='password'
driverclass=com.cloudera.hive.jdbc.HS2Driver
uri="jdbc:hive2://cloudera-server-id:port/database;transportMode=http;ssl=1;httpPath=myHttpPath;
useNativeQuery=1";

Connecting to Hive with Kerberos Authentication

Before Kerberos can be used, an administrator must first set up Kerberos for JDBC connections using “kinit”. For instructions, see information about SAS/ACCESS Interface to Hadoop in SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS. In this example, the URL= option connects to a Hive server with Kerberos enabled, but without SSL enabled. The Kerberos service principal name is hive/node1.example.com@EXAMPLE.COM, the host name for the data source is node1.example.com, and the server is listening on port 1000 for JDBC connections.

The KrbAuthType= value in the URL= string is specific to the operating environment. The value 3 is for AIX. For Windows and Linux, use 0 or 2. For details about the driver options to include in the URL= string, refer to your driver documentation.


option set=SAS_HADOOP_JAR_PATH="/usr/local/SAS/AccessClients/9.4/DataDrivers/mysrv1_jars";
option set=SAS_HADOOP_CONFIG_PATH="/mysrv1_cfg";

%kerberos_kinit_domain(database);

libname x hadoop
driverclass=com.cloudera.hive.jdbc.HS2Driver
url='jdbc:hive2://node1.example.com:10000/default;AuthMech=1;KrbAuthType=3;KrbRealm=example.com;
KrbHostFQDN=hs2node1.example.com;KrbServiceName=hive'
;
Last updated: February 3, 2026