Selects observations from SAS data sets that meet a particular condition.
| Valid in: | DATA step or PROC step |
|---|---|
| Categories: | Action |
| CAS | |
| Type: | Declarative |
| Note: | Using a random number function in a WHERE statement might generate a different result set from using a random number function in a subsetting IF statement. This difference can be caused by how the criteria are optimized internally by SAS and is expected behavior. |
| Tip: | The SAS Tutorial video Filtering a SAS Table in a DATA Step shows you how to filter data using the WHERE statement. |
Table of Contents
is an arithmetic or logical expression that generally consists of a sequence of operands and operators.
| Tips | The operands and operators described in the next several sections are also valid for the WHERE= data set option. |
|---|---|
| You can specify multiple where-expressions. |
can be AND, AND NOT, OR, or OR NOT.
Using the WHERE statement might improve the efficiency of your SAS programs because SAS is not required to read all observations from the input data set.
The WHERE statement cannot be executed conditionally. That is, you cannot use it as part of an IF-THEN statement.
WHERE statements can contain multiple WHERE expressions that are joined by logical operators.
The WHERE statement applies to all data sets in the preceding SET, MERGE, MODIFY, or UPDATE statement, and variables that are used in the WHERE statement must appear in all of those data sets. You cannot use the WHERE statement with the POINT= option in the SET and MODIFY statements.
You can apply OBS= and FIRSTOBS= processing to WHERE processing. For more information, see Processing a Segment of Data That Is Conditionally Selected in SAS Language Reference: Concepts.
The WHERE statement requires one or more input SAS data sets. When specifying the WHERE statement in a DATA step that reads raw data, you must specify either a SET, MERGE, MODIFY, or UPDATE statement in the DATA step as well.
Raw data is data that SAS reads from an external file or instream data that exists in a SAS program and that SAS reads using the DATALINES statement.
For each iteration of the DATA step, the first operation SAS performs in each execution of a SET, MERGE, MODIFY, or UPDATE statement is to determine whether the observation in the input data set meets the condition of the WHERE statement. The WHERE statement takes effect immediately after the input data set options are applied and before any other statement in the DATA step is executed. If a DATA step combines observations using a WHERE statement with a MERGE, MODIFY, or UPDATE statement, SAS selects observations from each input data set before it combines them.
If a DATA step contains both a WHERE statement and a BY statement, the WHERE statement executes before BY groups are created. Therefore, BY groups reflect groups of observations in the subset of observations that are selected by the WHERE statement, not the actual BY groups of observations in the original input data set.
For a complete discussion of BY-group processing, see BY-Group Processing in the DATA Step in SAS Language Reference: Concepts.
You can use the WHERE statement with any SAS procedure that reads a SAS data set. The WHERE statement is useful in order to subset the original data set for processing by the procedure. The Base SAS Procedures Guide documents the action of the WHERE statement only in those procedures for which you can specify more than one data set. In all other cases, the WHERE statement performs as documented here.
A DATA or PROC step attempts to use an available index to optimize the selection of data when an indexed variable is used in combination with one of these operators and functions:
SUBSTR requires these arguments:
where substr(variable,position,length)
='character-string';
An index is used in processing when the arguments of the SUBSTR function meet all of these conditions:
Operands in WHERE expressions can contain these values:
You cannot use variables that are created within the DATA step (for example, FIRST.variable, LAST.variable, _N_, or variables that are created in assignment statements) in a WHERE expression because the WHERE statement is executed before SAS brings observations into the DATA or PROC step. When WHERE expressions contain comparisons, the unformatted values of variables are compared.
Here are some examples of using operands in WHERE expressions:
where score>50;
where date>='01jan1999'd and time>='9:00't;
where state='Mississippi';
As in other SAS expressions, the names of numeric variables can stand alone. SAS treats values of 0 or missing as false; other values are true. These examples are WHERE expressions that contain the numeric variables EMPNUM and SSN:
where empnum;
where empnum and ssn;
Character literals or the names of character variables can also stand alone in WHERE expressions. If you use the name of a character variable by itself as a WHERE expression, SAS selects observations where the value of the character variable is not blank.
You can include both SAS operators and special WHERE-expression operators in the WHERE statement. For a complete list of the operators, see WHERE Statement Operators. For the rules that SAS follows when it evaluates WHERE expressions, see WHERE-Expression Processing in SAS Language Reference: Concepts.
This table lists the operators for the WHERE statement.
|
Operator Type |
Symbol or Mnemonic |
Description |
|---|---|---|
|
Arithmetic |
|
|
|
|
* |
multiplication |
|
|
/ |
division |
|
|
+ |
addition |
|
|
− |
subtraction |
|
|
** |
exponentiation |
|
Comparison 4 |
|
|
|
|
= or EQ |
equal to |
|
|
^=, ¬=, ~=, or NE1 |
not equal to |
|
|
> or GT |
greater than |
|
|
< or LT |
less than |
|
|
>= or GE |
greater than or equal to |
|
|
<= or LE |
less than or equal to |
|
|
IN |
equal to one of a list |
|
Logical (Boolean) |
|
|
|
|
& or AND |
logical and |
|
|
| or OR2 |
logical or |
|
|
~,^ , ¬, or NOT1 |
logical not |
|
Other |
|
|
|
|
||3 |
concatenation of character variables |
|
|
( ) |
indicate order of evaluation |
|
|
+ prefix |
positive number |
|
|
− prefix |
negative number |
|
WHERE Expression Only |
|
|
|
|
BETWEEN–AND |
an inclusive range |
|
|
? or CONTAINS |
a character string |
|
|
IS NULL or IS MISSING |
missing values |
|
|
LIKE |
match patterns |
|
|
=* |
sounds-like |
|
|
SAME-AND |
add clauses to an existing WHERE statement without retyping original one |
| 1 The caret (^), tilde (~), and the not sign (¬ ) all indicate a logical not. Use the character available on your keyboard, or use the mnemonic equivalent. | ||
| 2 The OR symbol ( | ), broken vertical bar ( | ), and exclamation point (!) all indicate a logical or. Use the character available on your keyboard, or use the mnemonic equivalent. | ||
| 3 Two OR symbols (| | ), two broken vertical bars ( | | ), or two exclamation points (!!) indicate concatenation. Use the character available on your keyboard. | ||
| 4 You can use the colon modifier (:) with any of the comparison operators in order to compare only a specified prefix of a character string. | ||
An escape character is a single character that, in a sequence of characters, signifies that what follows takes an alternative meaning. For the LIKE operator, an escape character signifies to search for literal instances of the % and _ characters in the variable's values instead of performing the special-character function.
For example, if the variable X contains the values abc, a_b, and axb, the following LIKE operator with an escape character selects only the value a_b. The escape character (/) specifies that the pattern searches for a literal ' _' that is surrounded by the characters a and b. The escape character (/) is not part of the search.
where x like 'a/_b' escape '/';
Without an escape character, the following LIKE operator would select the values a_b and axb. The special character underscore in the search pattern matches any single b character, including the value with the underscore:
where x like 'a_b';
To specify an escape character, include the character in the pattern-matching expression, and then the keyword ESCAPE followed by the escape-character expression. When you include an escape character, the pattern-matching expression must be enclosed in quotation marks, and it cannot contain a column name. The escape-character expression evaluates to a single character. The operands must be character or string literals. If it is a single character, enclose it in quotation marks.
LIKE 'pattern-matching-expression' ESCAPE 'escape-character-expression'
This DATA step produces
a SAS data set that contains only observations from data set customer in
which the value for name begins with Mac and
the value for city is Charleston or Atlanta.
data testmacs;
set customer;
where substr(name,1,3)='Mac' and
(city='Charleston' or city='Atlanta');
run;
This example produces
a CAS table that contains only the observations from the Sashelp.Class data
set in which the values for Age are greater
than 14.
cas casauto sessopts=(caslib='casuser'); /*1*/
caslib _all_ assign;
libname mycas cas;
data mycas.class; /*2*/
set sashelp.class;
run;
data mycas.class_out;
set mycas.class;
where Age>14; /*3*/
run;
Connect to CAS. To run the DATA step in CAS, you must first connect to a CAS server and start a CAS session. See Set Up Code for Examples in SAS Cloud Analytic Services: DATA Step Programming for information.
Load some sample data from the Sashelp library to CAS. Specify a CAS engine libref on the output table. (footnote 1)
Run the DATA step in CAS. Specify
the WHERE statement to filter the loaded CAS table mycas.class.
The DATA step writes the filtered output from mycas.class to
the output table mycas.class_out.
See SAS Cloud Analytic Services: DATA Step Programming for more information about the DATA step and SAS Cloud Analytic Services.
where empnum between 500 and 1000;
where company ? 'bay';
where company contains 'bay';
where name is null;
where name is missing;
where name like 'D%';
| Diana |
| Diane |
| Dianna |
| Dianthus |
| Dyan |
|
WHERE Statement |
Name Selected |
|---|---|
|
Dyan |
|
Diana, Diane |
|
Dianna |
|
all names from list |
where lastname=*'Smith';
where year>1991;
...more SAS statements...
where same and year<1999;
In this example, the second WHERE statement is equivalent to the following WHERE statement:
where year>1991 and year<1999;