DISTRIBUTED_BY= Data Set Option

Uses one or multiple columns to distribute table rows across database segments.

Valid in:	DATA and PROC steps (when accessing DBMS data using SAS/ACCESS software)
Category:	Data Set Control
Default:	RANDOMLY DISTRIBUTED
Data source:	Greenplum, HAWQ

Table of Contents

Syntax

DISTRIBUTED_BY='column-1 <…,column-n>' | DISTRIBUTED RANDOMLY

Syntax Description

column-name

specifies a DBMS column name.

DISTRIBUTED RANDOMLY

determines the column or set of columns that the Greenplum database uses to distribute table rows across database segments. This is known as round-robin distribution.

Details

For uniform distribution—namely, so that table records are stored evenly across segments (machines) that are part of the database configuration—the distribution key should be as unique as possible.

Example: Create a Table By Specifying a Distribution Key

libname x greenplm user=myusr1 password=mypwd1 dsn=mysrv1;
data x.sales (dbtype=(id=int qty=int amt=int) 
    distributed_by='distributed by (id)');
        id = 1;
        qty = 100;
        sales_date = '27Aug2009'd;
        amt = 20000;
run;

It creates the SALES table.

CREATE TABLE SALES
(id int,
 qty int,
 sales_date double precision,
 amt int
) distributed by (id)

Last updated: February 3, 2026