The SURVEYSELECT Procedure

Getting Started: SURVEYSELECT Procedure

(View the complete code for this example.)

The following examples show how to use PROC SURVEYSELECT to select probability-based random samples. These examples use simulated data for a customer satisfaction survey.

Suppose an internet service provider plans to conduct a customer satisfaction survey by selecting a random sample of customers from all current customers (the survey population). The company plans to interview the selected customers and make inferences about the survey population from the sample data.

The SAS data set Customers contains the sampling frame, which is the list of units in the survey population. The sample of customers will be selected from this sampling frame. The data set Customers is constructed from the company’s customer database and contains 13,471 observations (one observation for each customer).

The following PROC PRINT statements display the first 10 observations of the data set Customers and produce the table shown in Figure 1:

title1 'Customer Satisfaction Survey';
title2 'First 10 Observations';
proc print data=Customers(obs=10);
run;

Figure 1: Customers Data Set (First 10 Observations)

Customer Satisfaction Survey

First 10 Observations

Obs	CustomerID	State	Type	Usage
1	416-87-4322	AL	New	839
2	288-13-9763	GA	Old	224
3	339-00-8654	GA	Old	2451
4	118-98-0542	GA	New	349
5	421-67-0342	FL	New	562
6	623-18-9201	SC	New	68
7	324-55-0324	FL	Old	137
8	832-90-2397	AL	Old	1563
9	586-45-0178	GA	New	615
10	801-24-5317	SC	New	728

The variable CustomerID contains the unique customer identification number. The variable State contains the state where the customer is located. The value of the variable Type is 'Old' if the customer has subscribed to the service for more than one year; otherwise, the value of Type is 'New'. The variable Usage contains the customer’s average monthly service usage in minutes.

The following examples show how to use PROC SURVEYSELECT to implement three different probability sample designs. The first design is simple random sampling without stratification. The second design is a stratified design in which the list of customers is stratified by state and type; the sample is then selected by simple random sampling within strata. The third design is a stratified design that uses control sorting; customers are ordered by service usage within strata, and the sample is selected by systematic random sampling.

Last updated: December 09, 2022