BL_FORMAT= LIBNAME Statement Option

Specifies the format of the Hadoop bulk load staging file.

Category:	Performance
Default:	[Hadoop, Spark in HDFS] TEXT
Default:	[Spark in Databricks] CSV
Restriction:	This option is available only on Linux.
Requirement:	To specify this option, you must specify BULKLOAD=YES. BULKLOAD=YES is the default value.
Data source:	Hadoop, Spark
Notes:	Support for this option was added in SAS 9.4M8.
Notes:	Support for Spark was added in SAS 9.4M9.

Table of Contents

Syntax

BL_FORMAT=CSV | TEXT | ORC | PARQUET

Syntax Description

CSV

specifies a comma-separated value (CSV) file format.

Restriction	CSV is the only valid value for Databricks.

ORC

specifies the Apache ORC (Optimized Row Columnar) open-source file format.

Note	When specifying BL_FORMAT=ORC, TIMESTAMP fractional seconds are limited to three digits.

PARQUET

specifies the Apache Parquet open-source file format.

TEXT

specifies plain text format. The delimiter is specified with the BL_DELIMITER= LIBNAME option.

Details

Compared to using a text file, the ORC and Parquet formats might offer advantages in size, performance, and precision because numeric data values do not need to be converted to text in order to be loaded.

Hadoop: The Hadoop bulk load staging file can be created as a TEXT, ORC, or Parquet file.

Spark: The Spark engine supports bulk loading to Spark in HDFS and bulk loading to Databricks. When bulk loading to Spark in HDFS, the staging file can be created as a TEXT, ORC, or Parquet file. When bulk loading to Databricks, the staging files use the CSV format.