This example shows how to use PROC SURVEYSELECT to select a stratified random sample. The sampling frame (list of customers) is stratified by the variables State and Type. This stratification divides the sampling frame into nonoverlapping subgroups that are determined by the values of State and Type. Samples are then selected independently within the strata.
PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the Customers data set by the stratification variables State and Type:
proc sort data=Customers;
by State Type;
run;
The following PROC FREQ statements display the crosstabulation of the Customers data set by State and Type:
title1 'Customer Satisfaction Survey';
title2 'Strata of Customers';
proc freq data=Customers;
tables State*Type;
run;
Figure 4 shows the table of State by Type for the 13,471 customers. There are four states and two levels of Type, which form eight strata.
Figure 4: Stratification of Customers by State and Type
| Customer Satisfaction Survey |
| Strata of Customers |
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers data set according to the stratified sample design:
title1 'Customer Satisfaction Survey';
title2 'Stratified Sampling';
proc surveyselect data=Customers method=srs n=15
seed=1953 out=SampleStrata;
strata State Type;
run;
The STRATA statement names the stratification variables State and Type. In the PROC SURVEYSELECT statement, the METHOD=SRS option specifies simple random sampling, and the N= option specifies a sample size of 15 customers in each stratum. If you want to specify different sample sizes for different strata, you can use the N=SAS-data-set option to name a secondary data set that contains the stratum sample sizes. The SEED= option specifies 1953 as the initial seed for random number generation.
Figure 5 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 120 customers are selected.
Figure 5: Sample Selection Summary
| Customer Satisfaction Survey |
| Stratified Sampling |
| Selection Method | Simple Random Sampling |
|---|---|
| Strata Variables | State |
| Type |
| Input Data Set | CUSTOMERS |
|---|---|
| Random Number Seed | 1953 |
| Stratum Sample Size | 15 |
| Number of Strata | 8 |
| Total Sample Size | 120 |
| Output Data Set | SAMPLESTRATA |
The following PROC PRINT statements display the first 30 observations of the output data set SampleStrata:
title1 'Customer Satisfaction Survey';
title2 'Sample Selected by Stratified Design';
title3 '(First 30 Observations)';
proc print data=SampleStrata(obs=30);
run;
Figure 6 displays the first 30 observations of the output data set SampleStrata, which contains the sample of 120 customers (15 customers from each of the eight strata). The variable SelectionProb contains the selection probability for each customer in the sample. Because customers are selected with equal probability within strata, the selection probability is the stratum sample size (15) divided by the stratum population size. The selection probabilities differ from stratum to stratum because the stratum population sizes differ. The selection probability for each customer in the first stratum (State='AL' and Type='New') is 0.012116, and the selection probability for customers in the second stratum (State='AL' and Type='Old') is 0.021246. The variable SamplingWeight contains the sampling weights, which are computed as inverse selection probabilities.
Figure 6: Customer Sample (First 30 Observations)
| Customer Satisfaction Survey |
| Sample Selected by Stratified Design |
| (First 30 Observations) |
| Obs | State | Type | CustomerID | Usage | SelectionProb | SamplingWeight |
|---|---|---|---|---|---|---|
| 1 | AL | New | 015-57-9903 | 26 | 0.012116 | 82.5333 |
| 2 | AL | New | 052-18-5029 | 576 | 0.012116 | 82.5333 |
| 3 | AL | New | 064-72-0145 | 88 | 0.012116 | 82.5333 |
| 4 | AL | New | 291-22-2497 | 1221 | 0.012116 | 82.5333 |
| 5 | AL | New | 305-62-6833 | 187 | 0.012116 | 82.5333 |
| 6 | AL | New | 309-63-9722 | 534 | 0.012116 | 82.5333 |
| 7 | AL | New | 413-76-0209 | 435 | 0.012116 | 82.5333 |
| 8 | AL | New | 492-18-7867 | 70 | 0.012116 | 82.5333 |
| 9 | AL | New | 508-16-8324 | 189 | 0.012116 | 82.5333 |
| 10 | AL | New | 561-82-0366 | 392 | 0.012116 | 82.5333 |
| 11 | AL | New | 685-24-1718 | 74 | 0.012116 | 82.5333 |
| 12 | AL | New | 800-20-2155 | 21 | 0.012116 | 82.5333 |
| 13 | AL | New | 857-94-2672 | 77 | 0.012116 | 82.5333 |
| 14 | AL | New | 918-29-9618 | 540 | 0.012116 | 82.5333 |
| 15 | AL | New | 963-93-4916 | 33 | 0.012116 | 82.5333 |
| 16 | AL | Old | 182-45-1938 | 160 | 0.021246 | 47.0667 |
| 17 | AL | Old | 210-85-9046 | 184 | 0.021246 | 47.0667 |
| 18 | AL | Old | 211-14-1373 | 88 | 0.021246 | 47.0667 |
| 19 | AL | Old | 229-87-9527 | 362 | 0.021246 | 47.0667 |
| 20 | AL | Old | 239-16-9426 | 22 | 0.021246 | 47.0667 |
| 21 | AL | Old | 283-78-3723 | 595 | 0.021246 | 47.0667 |
| 22 | AL | Old | 293-90-2342 | 124 | 0.021246 | 47.0667 |
| 23 | AL | Old | 360-78-7048 | 375 | 0.021246 | 47.0667 |
| 24 | AL | Old | 432-96-1275 | 2283 | 0.021246 | 47.0667 |
| 25 | AL | Old | 534-79-2367 | 167 | 0.021246 | 47.0667 |
| 26 | AL | Old | 668-77-4832 | 30 | 0.021246 | 47.0667 |
| 27 | AL | Old | 681-88-8208 | 2133 | 0.021246 | 47.0667 |
| 28 | AL | Old | 794-79-7878 | 1274 | 0.021246 | 47.0667 |
| 29 | AL | Old | 954-40-0057 | 30 | 0.021246 | 47.0667 |
| 30 | AL | Old | 954-98-4646 | 1038 | 0.021246 | 47.0667 |