Here are the Spark data types that the Spark engine supports.
| BIGINT | FLOAT |
| BOOLEAN | INT |
| DECIMAL | SMALLINT |
| DOUBLE | TINYINT |
| BINARY | STRING |
| CHARn | VARCHARn |
| DATE | TIMESTAMP |
| INTERVAL |
| ARRAY | STRUCT |
| MAP |
This table shows the default formats that SAS/ACCESS Interface to Spark assigns to SAS variables when using the LIBNAME statement to read from a Spark table. These default formats are based on Spark column attributes.
|
Spark Data Type |
SAS Data Type |
Default SAS Format |
|---|---|---|
|
ARRAY |
character |
none |
|
BINARY |
character |
$HEX32767 |
|
BOOLEAN |
numeric |
w. (1.) |
|
CHAR(n)1 |
character |
$w. |
|
DATE |
numeric |
DATE9. |
|
DECIMAL(p,s)2 |
numeric |
w.d |
|
DOUBLE |
numeric |
w. |
|
FLOAT |
numeric |
w. |
|
INT |
numeric |
w. (11.) |
|
INTERVAL |
numeric |
DATETIME25.6 |
|
MAP |
character |
none |
|
SMALLINT |
numeric |
w. (6.) |
|
STRING3 |
character |
$32767. |
|
STRUCT |
character |
none |
|
TIMESTAMP |
numeric |
DATETIME25.6 |
|
TINYINT |
numeric |
w. (4.) |
|
VARCHAR(n)1 |
character |
$w. |
| 1 n in Spark data types is equivalent to w. in SAS formats. | ||
| 2 p,s in Spark data types is equivalent to w.d in SAS formats | ||
| 3 The STRING data type can be mapped to the VARCHAR data type by the JDBC client driver. For more information, check the JDBC vendor’s documentation. | ||
This table shows the default Spark data types that SAS/ACCESS assigns to SAS variable formats during output operations when you use the LIBNAME statement.
|
SAS Variable Format |
Spark Data Type |
|---|---|
|
w.d |
DOUBLE |
|
w. |
INT, SMALLINT, TINYINT, BIGINT |
|
$w. |
VARCHAR |
|
datetime formats |
TIMESTAMP |
|
date formats |
DATE |
|
time formats1 |
VARCHAR |
| 1 A column created in Spark using a TIME format will be created as a VARCHAR column with a SASFMT format. | |
Spark has a special value called NULL. A Spark NULL value means an absence of information and is analogous to a SAS missing value. When SAS/ACCESS reads a Spark NULL value, it interprets it as a SAS missing value. For more information about how SAS handles NULL values, see Potential Result Set Differences When Processing Null Data.
To control how the DBMS handles SAS missing character values, use the NULLCHAR= and NULLCHARVAL= data set options.
When creating a table in HiveQL or Spark SQL with STRING columns, SAS might add extended type information to the table. The extended type information is added by including a TBLPROPERTIES clause on the CREATE TABLE query with SASFMT key-value pairs. This feature saves the data type information, as it is known to SAS, in the foreign table’s metadata. SASFMT table properties were originally developed for older versions of HiveQL and Spark SQL that did not support basic SQL data types such as CHAR, DATE, TIMESTAMP, and VARCHAR. As current versions of HiveQL and Spark SQL have come to support these data types, it is no longer necessary to specify SASFMT table properties, although they are still supported for backward compatibility. The CHAR, DATE, TIMESTAMP, and VARCHAR Hive and Spark types are used instead.
There are two cases where the SASFMT table properties can still be used.
libname x spark <connection-options>;
proc sql;
connect using x;
execute (drop table m1test) by x;
quit;
proc sql;
create table x.m1test(dbtype =(v=string)) (v char(10));
quit;
Here is the output in the log output:
CREATE TABLE `accesstesting`.`TIMETEST` (`t` VARCHAR(20)) TBLPROPERTIES ('SAS
OS Name'='Linux','SAS Version'='version-number','SASFMT:t'='TIME(8.0)')
proc sql;
connect using x;
execute (create table stringtest (s string)) by x;
execute (alter table stringtest SET TBLPROPERTIES ('SASFMT:s'='TIME(8.0)')) by x;
quit;
proc sql;
describe table x.stringtest;
quit;
Here is the output from the DESCRIBE TABLE statement:
NOTE: SQL table X.STRINGTEST was created like: create table X.STRINGTEST ( s num format=TIME8. informat=TIME8. label='s' );