This section describes the LIBNAME statement that SAS/ACCESS Interface to Spark supports. For general information about this feature, see LIBNAME Statement for Relational Databases.
Fetch error: java.lang.NoClassDefFoundError: Could not initialize class.Information about using SAS/ACCESS Interface to Spark with Hadoop distributions that support the Hadoop File System (HDFS) is provided in the LIBNAME Statement for the Hadoop Engine. This includes information about Read and Write security and related HDFS permissions.
Here is the LIBNAME statement syntax for accessing Spark.
For general information about the LIBNAME statement that is not specific to SAS/ACCESS, see LIBNAME Statement in SAS Global Statements: Reference.
specifies any SAS name that serves as an alias to associate SAS with a database, schema, server, or group of tables and views.
specifies the SAS/ACCESS engine name for the Spark interface.
provides options that are required to connect to Spark SQL.
All connection options are also valid in the CONNECT statement when you use the SQL pass-through facility (SQL procedure) to connect to your DBMS.
enables or disables SSL wire encryption of client requests.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
This option behaves differently in SAS 9.4M9 than in SAS 9.4M8 and SAS Viya 3.5.
For SAS 9.4M9, the option specifies additional resources, including JDBC drivers, to add to the JDBC class path. The directory-path must be a subdirectory of the required location. If you specify CLASSPATH=, it is recommended to also specify the DRIVERCLASS= option in order to ensure that you select the correct driver.
/SASHOME/AccessClients/9.4/DataDrivers/jdbc
or, if SASHOME is not set:
/usr/local/SAS/AccessClients/9.4/DataDrivers/jdbc
SASHOME\AccessClients\9.4\DataDrivers\jdbc
or, if SASHOME is not set:
C:\Progam Files\SASHome\AccessClients\9.4\DataDrivers\jdbc
For SAS 9.4M8 and SAS Viya 3.5, the option specifies the path to the JDBC JAR files. The value that you specify for CLASSPATH= overrides the value that is set for the SAS_ACCESS_CLASSPATH environment variable. You must set a value for the CLASSPATH= connection option or the SAS_ACCESS_CLASSPATH environment variable. The default value is the value that is set for the SAS_ACCESS_CLASSPATH environment variable.
Example:
libname x spark platform=databricks user=token pwd='mytoken'
classpath='/usr/local/SAS/AccessClients/9.4/DataDrivers/jdbc/alt'
driverclass='class'
...more-connection-options...;
specifies the fully qualified class name of the JDBC driver to use for your connection.
Alias: DRIVER=
Here are some examples that show how the DRIVERCLASS= value interacts with the PLATFORM= option. If the PLATFORM= option is omitted, SAS attempts to determine it from the DRIVERCLASS= value. If both PLATFORM= and DRIVERCLASS= are omitted, the Cloudera driver is used.
driverclass='com.simba.databricks.jdbc.Driver'
driverclass='org.apache.hive.jdbc.HiveDriver'
driverclass='com.databricks.client.jdbc.Driver'
driverclass='com.cloudera.hive.jdbc.HS2Driver'
when option HDFS_METADIR= is specified, specifies the path to the Spark directory where SAS reads and writes data (for example, ‘/sas/hpa’).
Note: This option is valid only when SAS is configured with access to HDFS.
Note: This option is deprecated in SAS 9.4M9.
specifies the path to an HDFS directory that contains XML-based table definitions. Through these descriptors, SAS then accesses the data using HDFS instead of Spark. If you want the Spark engine to connect using Spark SQL, do not specify this option.
Note: This option is valid only when SAS is configured with access to HDFS.
Note: This option is deprecated in SAS 9.4M9.
specifies the path to the HDFS directory where SAS reads and writes temporary data.
Default:
HDFS_TEMPDIR='/tmp'
Note: This option is valid only when SAS is configured with access to HDFS.
Note: This option is deprecated in SAS 9.4M9.
specifies the principal name of the Spark server in an environment that uses Kerberos.
Default: none
specifies the HTTP or HTTPS endpoint for building the JDBC URL connection string. This option is ignored when the URL= option is specified.
Default: none
Note: Support for this option starts in SAS 9.4M9.
specifies the MapReduce job name.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specifies the MapReduce queue name.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specifies the Spark password that is associated with your user ID. If the password contains spaces or nonalphanumeric characters, you must enclose the password in quotation marks. If you do not want to enter your Spark password in uncoded text on this statement, see PROC PWENCODE in the Base SAS Procedures Guide for a method to encode it.
Alias: PASS=, PWD=, PW=
specifies the target platform. SPARK specifies that the target platform is a Spark SQL data source. DATABRICKS specifies that the target platform is Databricks. The selected value is used internally to construct an appropriate JDBC URL connection string from SAS connection options.
Set PLATFORM=DATABRICKS when connecting to Databricks.
Default: SPARK
Note: Support for this option starts in SAS 9.4M9.
specifies the Spark schema to use for your database connection.
Alias: DATABASE=, DB=
Default:
default
specifies the path to the SSL trust store.
When specified in the LIBNAME statement, this argument overrides the value stored in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specifies an Apache Tez queue name in which to execute client requests.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specifies the password for the trust store.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specifies the JDBC URL to connect to Spark SQL.
See the Databricks documentation for information about how to construct a Databricks JDBC driver connection string.
See the documentation for your JDBC driver for information to construct a connection string for other Spark data sources.
Alias: URL=
Default: none
specifies the user name for Read (JDBC) and Write (HDFS) operations. Do not use the USER= argument if your Hadoop cluster is secured by Kerberos.
Alias: UID=
specifies the ZooKeeper namespace.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specifies the port number that ZooKeeper listens to for client requests.
Note: Support for this option starts in SAS 9.4M9.
specifies the server nodes that are available to run client requests. Use commas to separate node names.
When specified in the LIBNAME statement, this argument overrides the value in the hive-site.xml file.
Note: Support for this option starts in SAS 9.4M9.
specify how SAS processes DBMS objects. The following table describes the LIBNAME options for SAS/ACCESS Interface to Spark, with the applicable default values. This table also identifies LIBNAME options that are valid in the CONNECT statement in the SQL procedure. For details, see LIBNAME Options for Relational Databases.
|
Option |
Default Value |
Valid in CONNECT |
|---|---|---|
|
none |
|
|
|
NO |
|
|
|
none |
|
|
|
YES if BULKLOAD=NO and your JDBC driver supports batch updates NO otherwise |
||
|
none |
||
|
none |
||
|
~/.aws/config (UNIX) C:\Users\<userid>\.aws\config (Windows) |
||
|
~/.aws/credentials (UNIX) C:\Users\<userid>\.aws\credentials (Windows) |
||
|
none |
||
|
none |
||
|
NO |
||
|
none |
||
|
temporary file directory that is specified by the UTILLOC= system option |
||
|
YES |
||
|
comma |
||
|
dfs.core.windows.net |
||
|
none |
||
|
none |
||
|
none |
||
|
Databricks: CSV Spark in HDFS: Text |
||
|
none |
||
|
none |
||
|
none |
||
|
none |
||
|
none |
||
|
8020 |
|
|
|
none |
||
|
60 |
||
|
none |
||
|
NO |
||
|
YES |
|
|
|
NO |
||
|
SHAREDREAD |
● |
|
|
matches the maximum number of bytes per single character of the SAS session encoding |
||
|
none |
● |
|
|
none |
● |
|
|
NO |
● |
|
|
none |
● |
|
|
DBMS |
|
|
|
none |
|
|
|
none |
|
|
|
1024 |
● |
|
|
LONG |
||
|
NO |
|
|
|
COMPAT |
|
|
|
NO |
● |
|
|
YES |
|
|
|
none |
|
|
|
none |
|
|
|
The trace log is overwritten and includes no time stamps or thread identification. |
|
|
|
NO |
● |
|
|
automatically calculated based on row length |
||
|
30 |
● |
|
|
NONE |
|
|
|
none |
|
|
|
none |
|
|
|
NO |
|
|
|
none |
● |
|
|
0 |
● |
|
|
automatically calculated based on row length |
● |
|
|
the Spark schema
|
|
|
|
none |
● |
|
|
YES |
|
|
|
none |
● |
|
|
none |
|
|
|
none |
|
|
|
ERROR |
● |
The following examples provide a basic introduction to common SAS/ACCESS to Spark configurations.
Use the following example to establish a connection between SAS and Databricks. Authentication is through a Databricks Personal Access Token. When using a Personal Access Token, specify USER=“token” and specify a personal access token in PASSWORD=. Use of the AUTHDOMAIN= option is preferred to avoid exposing sensitive credentials.
This example includes the LIBNAME option PLATFORM=DATABRICKS. The PLATFORM= LIBNAME option, new in SAS 9.4M9, optimizes the connection string that is generated internally from the options that are specified in the LIBNAME statement for the specified platform. The option also selects the appropriate JDBC driver for the connection from the default drivers.
Starting in SAS 9.4M9, bulk loading and bulk unloading are available for Azure connections when appropriate Azure options are specified. Bulk loading to Databricks is enabled by default so that there is no need to specify BULKLOAD=YES. To enable bulk unloading, you must specify BULKUNLOAD=YES.
Important: When connecting to Azure, you must specify the Azure tenant identifier using the AZURETENANTID= system option in addition to supplying Azure connection options in the LIBNAME statement.
options azuretenantid="tenant-user-id";
libname x spark platform=databricks bulkunload=yes
server='databricks-server' database='database' port=443
user=token pwd='personal-access-token'
httpPath='myHttpPath'
bl_applicationid='azure-application-id'
bl_accountname='adls2-storage-account'
bl_filesystem='adls2-blob-container'
bl_folder='adls2-folder';
For more information about the options for bulk loading and unloading in Azure, see Bulk Loading and Unloading to Databricks in Azure.
Starting in SAS 9.4M9, bulk loading is available for AWS when appropriate AWS options are specified. AWS connections use S3 storage. Bulk loading to Databricks is enabled by default so that there is no need to specify BULKLOAD=YES. Bulk unloading is currently not available for Databricks in AWS.
libname y spark platform=databricks
server='databricks-server'
database=database port=443
user=token pwd='personal-access-token'
httpPath='myHttpPath'
bl_bucket='mybucket'
bl_aws_config='path-to-config-file'
bl_aws_credentials_file='path-to-credentials-file'
bl_delete_datafile=no;
For more information about the options for bulk loading to Databricks in AWS, see Bulk Loading to Databricks in AWS.
Use the following example to connect SAS to Apache Spark on a Hadoop distribution that has access to HDFS. When HDFS is available, SAS/ACCESS Interface to Spark can use HDFS for bulk loading. BULKLOAD=YES is enabled by default. When HDFS is not available, specify BULKLOAD=NO.
Specifying PLATFORM=SPARK is optional, because SPARK is the default PLATFORM= value.
option set=SAS_HADOOP_JAR_PATH="/usr/local/SAS/AccessClients/9.4/DataDrivers/mysrv1_jars";
option set=SAS_HADOOP_CONFIG_PATH="/mysrv1_cfg";
libname spkhdfs spark server='spark-server' port=10001 database='default'
user=myuserid password="myuserpwd";
For more information, see Bulk Loading to Spark Using HDFS.