Specifies the format of the Hadoop bulk load staging file.
| Category: | Performance |
|---|---|
| Default: | [Hadoop, Spark in HDFS] TEXT |
| [Spark in Databricks] CSV | |
| Restriction: | This option is available only on Linux. |
| Requirement: | To specify this option, you must specify BULKLOAD=YES. BULKLOAD=YES is the default value. |
| Data source: | Hadoop, Spark |
| Notes: | Support for this option was added in SAS 9.4M8. |
| Support for Spark was added in SAS 9.4M9. |
Table of Contents
specifies a comma-separated value (CSV) file format.
| Restriction | CSV is the only valid value for Databricks. |
|---|
specifies the Apache ORC (Optimized Row Columnar) open-source file format.
| Note | When specifying BL_FORMAT=ORC, TIMESTAMP fractional seconds are limited to three digits. |
|---|
specifies the Apache Parquet open-source file format.
specifies plain text format. The delimiter is specified with the BL_DELIMITER= LIBNAME option.
Compared to using a text file, the ORC and Parquet formats might offer advantages in size, performance, and precision because numeric data values do not need to be converted to text in order to be loaded.
Hadoop: The Hadoop bulk load staging file can be created as a TEXT, ORC, or Parquet file.
Spark: The Spark engine supports bulk loading to Spark in HDFS and bulk loading to Databricks. When bulk loading to Spark in HDFS, the staging file can be created as a TEXT, ORC, or Parquet file. When bulk loading to Databricks, the staging files use the CSV format.