Data Types for Spark

Supported Spark Data Types

Here are the Spark data types that the Spark engine supports.

Numeric data

BIGINT

FLOAT

BOOLEAN

INT

DECIMAL

SMALLINT

DOUBLE

TINYINT

Note: When performing calculations on numeric values and when storing numeric values, SAS maintains up to 15 digits of precision. When you read values that contain more than 15 decimal digits of precision from a database into SAS, the values that SAS reads are rounded to meet this condition. When you use a large numeric value in a WHERE clause, this rounding can cause unexpected results, such as not selecting desired rows. For noncomputational purposes, such as storing ID values or credit card numbers, you can read the data in as character data. For more information, see Your Options When Choosing Your Needed Degree of Precision.

Character data

BINARY

STRING

CHARn

VARCHARn

Date and time data

DATE

TIMESTAMP

INTERVAL

Complex data

ARRAY

STRUCT

MAP

LIBNAME Statement Data Conversions

This table shows the default formats that SAS/ACCESS Interface to Spark assigns to SAS variables when using the LIBNAME statement to read from a Spark table. These default formats are based on Spark column attributes.

LIBNAME Statement: Default SAS Formats for Spark Data Types
Spark Data Type	SAS Data Type	Default SAS Format
ARRAY	character	none
BINARY	character	$HEX32767
BOOLEAN	numeric	w. (1.)
CHAR(n)1	character	$w.
DATE	numeric	DATE9.
DECIMAL(p,s)2	numeric	w.d
DOUBLE	numeric	w.
FLOAT	numeric	w.
INT	numeric	w. (11.)
INTERVAL	numeric	DATETIME25.6
MAP	character	none
SMALLINT	numeric	w. (6.)
STRING3	character	$32767.
STRUCT	character	none
TIMESTAMP	numeric	DATETIME25.6
TINYINT	numeric	w. (4.)
VARCHAR(n)1	character	$w.
1 n in Spark data types is equivalent to w. in SAS formats.
2 p,s in Spark data types is equivalent to w.d in SAS formats
3 The STRING data type can be mapped to the VARCHAR data type by the JDBC client driver. For more information, check the JDBC vendor’s documentation.

This table shows the default Spark data types that SAS/ACCESS assigns to SAS variable formats during output operations when you use the LIBNAME statement.

LIBNAME Statement: Default Spark Data Types for SAS Variable Formats
SAS Variable Format	Spark Data Type
w.d	DOUBLE
w.	INT, SMALLINT, TINYINT, BIGINT
$w.	VARCHAR
datetime formats	TIMESTAMP
date formats	DATE
time formats1	VARCHAR
1 A column created in Spark using a TIME format will be created as a VARCHAR column with a SASFMT format.

Spark Null Values

Spark has a special value called NULL. A Spark NULL value means an absence of information and is analogous to a SAS missing value. When SAS/ACCESS reads a Spark NULL value, it interprets it as a SAS missing value. For more information about how SAS handles NULL values, see Potential Result Set Differences When Processing Null Data.

To control how the DBMS handles SAS missing character values, use the NULLCHAR= and NULLCHARVAL= data set options.

SAS Table Properties for Hive and Spark

When creating a table in HiveQL or Spark SQL with STRING columns, SAS might add extended type information to the table. The extended type information is added by including a TBLPROPERTIES clause on the CREATE TABLE query with SASFMT key-value pairs. This feature saves the data type information, as it is known to SAS, in the foreign table’s metadata. SASFMT table properties were originally developed for older versions of HiveQL and Spark SQL that did not support basic SQL data types such as CHAR, DATE, TIMESTAMP, and VARCHAR. As current versions of HiveQL and Spark SQL have come to support these data types, it is no longer necessary to specify SASFMT table properties, although they are still supported for backward compatibility. The CHAR, DATE, TIMESTAMP, and VARCHAR Hive and Spark types are used instead.

There are two cases where the SASFMT table properties can still be used.

HiveQL and Spark SQL do not support the TIME data type. Therefore, SAS continues to use the SASFMT table properties when creating a table with the TIME data type. The SAS TIME column is saved as a VARCHAR column with its associated SASFMT table property. Here is an example of the TBLPROPERTIES that are generated by SAS.
```
libname x spark <connection-options>;
proc sql;
  connect using x;
  execute (drop table m1test) by x;
quit;

proc sql;
  create table x.m1test(dbtype =(v=string)) (v char(10));
 quit;
```
Here is the output in the log output:
```
CREATE TABLE `accesstesting`.`TIMETEST` (`t` VARCHAR(20)) TBLPROPERTIES ('SAS
 OS Name'='Linux','SAS Version'='version-number','SASFMT:t'='TIME(8.0)')
```
Advanced users can add SASFMT table properties to an existing table to refine the data type or length as it is known to SAS by using explicit SQL pass-through to issue a HiveQL or Spark SQL ALTER TABLE query. In the example below, a table is created with the STRING data type using explicit SQL. Explicit SQL is used again to add the SASFMT table property. The PROC SQL DESCRIBE TABLE statement describes the table as it is known to SAS.

proc sql;
 connect using x;
 execute (create table stringtest (s string)) by x;
 execute (alter table stringtest SET TBLPROPERTIES ('SASFMT:s'='TIME(8.0)')) by x;
quit;

proc sql;
 describe table x.stringtest;
quit;

Here is the output from the DESCRIBE TABLE statement:

NOTE: SQL table X.STRINGTEST was created like:
create table X.STRINGTEST  (
   s num format=TIME8. informat=TIME8. label='s'
  );

Last updated: February 3, 2026