Data Types for Spark

Supported Spark Data Types

Here are the Spark data types that the Spark engine supports.

LIBNAME Statement Data Conversions

This table shows the default formats that SAS/ACCESS Interface to Spark assigns to SAS variables when using the LIBNAME statement to read from a Spark table. These default formats are based on Spark column attributes.

LIBNAME Statement: Default SAS Formats for Spark Data Types

Spark Data Type

SAS Data Type

Default SAS Format

ARRAY

character

none

BINARY

character

$HEX32767

BOOLEAN

numeric

w. (1.)

CHAR(n)1

character

$w.

DATE

numeric

DATE9.

DECIMAL(p,s)2

numeric

w.d

DOUBLE

numeric

w.

FLOAT

numeric

w.

INT

numeric

w. (11.)

INTERVAL

numeric

DATETIME25.6

MAP

character

none

SMALLINT

numeric

w. (6.)

STRING3

character

$32767.

STRUCT

character

none

TIMESTAMP

numeric

DATETIME25.6

TINYINT

numeric

w. (4.)

VARCHAR(n)1

character

$w.

1 n in Spark data types is equivalent to w. in SAS formats.
2 p,s in Spark data types is equivalent to w.d in SAS formats
3 The STRING data type can be mapped to the VARCHAR data type by the JDBC client driver. For more information, check the JDBC vendor’s documentation.

This table shows the default Spark data types that SAS/ACCESS assigns to SAS variable formats during output operations when you use the LIBNAME statement.

LIBNAME Statement: Default Spark Data Types for SAS Variable Formats

SAS Variable Format

Spark Data Type

w.d

DOUBLE

w.

INT, SMALLINT, TINYINT, BIGINT

$w.

VARCHAR

datetime formats

TIMESTAMP

date formats

DATE

time formats1

VARCHAR

1 A column created in Spark using a TIME format will be created as a VARCHAR column with a SASFMT format.

Spark Null Values

Spark has a special value called NULL. A Spark NULL value means an absence of information and is analogous to a SAS missing value. When SAS/ACCESS reads a Spark NULL value, it interprets it as a SAS missing value. For more information about how SAS handles NULL values, see Potential Result Set Differences When Processing Null Data.

To control how the DBMS handles SAS missing character values, use the NULLCHAR= and NULLCHARVAL= data set options.

SAS Table Properties for Hive and Spark

When creating a table in HiveQL or Spark SQL with STRING columns, SAS might add extended type information to the table. The extended type information is added by including a TBLPROPERTIES clause on the CREATE TABLE query with SASFMT key-value pairs. This feature saves the data type information, as it is known to SAS, in the foreign table’s metadata. SASFMT table properties were originally developed for older versions of HiveQL and Spark SQL that did not support basic SQL data types such as CHAR, DATE, TIMESTAMP, and VARCHAR. As current versions of HiveQL and Spark SQL have come to support these data types, it is no longer necessary to specify SASFMT table properties, although they are still supported for backward compatibility. The CHAR, DATE, TIMESTAMP, and VARCHAR Hive and Spark types are used instead.

There are two cases where the SASFMT table properties can still be used.

proc sql;
 connect using x;
 execute (create table stringtest (s string)) by x;
 execute (alter table stringtest SET TBLPROPERTIES ('SASFMT:s'='TIME(8.0)')) by x;
quit;

proc sql;
 describe table x.stringtest;
quit;

Here is the output from the DESCRIBE TABLE statement:

NOTE: SQL table X.STRINGTEST was created like:
create table X.STRINGTEST  (
   s num format=TIME8. informat=TIME8. label='s'
  );
Last updated: February 3, 2026