IF Statement: Subsetting

Continues processing only those observations that meet the condition of the specified expression.

Valid in: DATA step
Categories: Action
CAS
Type: Executable
Note: Using a random number function in a WHERE statement might generate a different result set from using a random number function in a subsetting IF statement. This difference can be caused by how the criteria are optimized internally by SAS and is expected behavior.

Syntax

Arguments

expression

is any SAS expression.

Details

The Basics

The subsetting IF statement causes the DATA step to continue processing only those raw data records or those observations from a SAS data set that meet the condition of the expression that is specified in the IF statement. If the expression is true for the observation (its value is neither 0 nor missing), SAS continues to execute the DATA step and includes the observation in the output data set. The resulting SAS data set or data sets contain a subset of the original external file or SAS data set.

If the expression is false (its value is 0 or missing), no further statements are processed for that observation or record, the current observation is not written to the data set, and the remaining program statements in the DATA step are not executed. SAS immediately returns to the beginning of the DATA step because the subsetting IF statement does not require additional statements to stop processing observations.

Using the Equivalent of the CONTAINS and LIKE Operators in an IF Statement

The LIKE operator in a WHERE clause matches patterns in words. To get the equivalent result in an IF statement, the '=:' operator can be used. This matches patterns that occur at the beginning of a string. Here is an example.

data test;
   input name $;
   datalines;
John
Diana
Diane
Sally
Doug
David
;
 run;

data test;
   set test;
   if name =: 'D';
run;

proc print;
run;

The CONTAINS operator in a WHERE clause checks for a character string within a value. To get the equivalent result in an IF statement, the INDEX function can be used. For example:

data test;
   set test;
   if index(name,'ian') ge 1;
run;

proc print;
run;

Comparisons

  • The subsetting IF statement is equivalent to this IF-THEN statement:
    if not (expression)
       then delete;
  • When you create SAS data sets, use the subsetting IF statement when it is easier to specify a condition for including observations. When it is easier to specify a condition for excluding observations, use the DELETE statement.
  • The subsetting IF and the WHERE statements are not equivalent. The two statements work differently and produce different output data sets in some cases. The most important differences are summarized as follows:
    • The subsetting IF statement selects observations that have been read into the program data vector. The WHERE statement selects observations before they are brought into the program data vector. The subsetting IF might be less efficient than the WHERE statement because it must read each observation from the input data set into the program data vector.
    • The subsetting IF statement and WHERE statement can produce different results in DATA steps that interleave, merge, or update SAS data sets.
    • When the subsetting IF statement is used with the MERGE statement, SAS selects observations after the current observations are combined. When the WHERE statement is used with the MERGE statement, SAS applies the selection criteria to each input data set before combining the current observations.
    • The subsetting IF statement can select observations from an existing SAS data set or from raw data that are read with the INPUT statement. The WHERE statement can select observations only from existing SAS data sets.
    • The subsetting IF statement is executable; the WHERE statement is not.

Example: Limiting Observations

  • This example results in a data set that contains only those observations with the value F for the variable SEX:
    if sex='F';
  • This example results in a data set that contains all observations for which the value of the variable AGE is not missing or 0:
    if age;

See Also

Last updated: June 17, 2025