The SURVEYSELECT Procedure

PPS Sequential Sampling

If you specify the METHOD=PPS_SEQ option, PROC SURVEYSELECT uses Chromy’s method of sequential random sampling. For more information, see Chromy (1979) and Williams and Chromy (1980). Chromy’s method selects units sequentially with probability proportional to size and with minimum replacement. Selection with minimum replacement means that the actual number of hits for a unit can equal the integer part of the expected number of hits for that unit, or the next largest integer. This can be compared to selection without replacement, where each unit can be selected only once, so the number of hits can equal 0 or 1. The other alternative is selection with replacement, where there is no restriction on the number of hits for each unit, so the number of hits can equal 0 comma 1 comma ellipsis comma n Subscript h Baseline, where n Subscript h is the stratum sample size.

Sequential random sampling controls the distribution of the sample by spreading it throughout the sampling frame or stratum, thus providing implicit stratification according to the order of units in the frame or stratum. You can use the CONTROL statement to sort the input data set by the CONTROL variables before sample selection. If you also use a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. By default (or if you specify the SORT=SERP option), the procedure uses hierarchic serpentine ordering to sort the sampling frame by the CONTROL variables within strata. If you specify the SORT=NEST option, the procedure uses nested sorting. See the section Sorting by CONTROL Variables for descriptions of serpentine and nested sorting. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies sequential selection to the observations in the order in which they appear in the input data set.

According to Chromy’s method of sequential selection, PROC SURVEYSELECT first chooses a starting unit randomly from the entire stratum, with probability proportional to size. The procedure uses this unit as the first one and treats the stratum observations as a closed loop. This is done so that all pairwise (joint) expected number of hits are positive and an unbiased variance estimator can be obtained. The procedure numbers observations sequentially from the random start to the end of the stratum and then continues from the beginning of the stratum until all units are numbered.

Beginning with the randomly chosen starting unit, Chromy’s method partitions the ordered stratum sampling frame into n Subscript h zones of equal size. There is one selection from each zone and a total of n Subscript h hits (selections), although fewer than n Subscript h distinct units might be selected. Beginning with the random start, the procedure accumulates the expected number of hits and computes

normal upper E left-parenthesis upper S Subscript h i Baseline right-parenthesis equals n Subscript h Baseline upper Z Subscript h i
upper I Subscript h i Baseline equals normal upper I normal n normal t left-parenthesis sigma-summation Underscript j equals 1 Overscript i Endscripts normal upper E left-parenthesis upper S Subscript h j Baseline right-parenthesis right-parenthesis
upper F Subscript h i Baseline equals normal upper F normal r normal a normal c left-parenthesis sigma-summation Underscript j equals 1 Overscript i Endscripts normal upper E left-parenthesis upper S Subscript h j Baseline right-parenthesis right-parenthesis

where normal upper E left-parenthesis upper S Subscript h i Baseline right-parenthesis represents the expected number of hits for unit i in stratum h, normal upper I normal n normal t left-parenthesis dot right-parenthesis denotes the integer part of the number, and normal upper F normal r normal a normal c left-parenthesis dot right-parenthesis denotes the fractional part.

Considering each unit sequentially, Chromy’s method determines the actual number of hits for unit i by comparing the total number of hits for the first (i – 1) units,

upper T Subscript h left-parenthesis i minus 1 right-parenthesis Baseline equals sigma-summation Underscript j equals 1 Overscript i minus 1 Endscripts upper S Subscript h j

with the value of upper I Subscript h left-parenthesis i minus 1 right-parenthesis.

If upper T Subscript h left-parenthesis i minus 1 right-parenthesis Baseline equals upper I Subscript h left-parenthesis i minus 1 right-parenthesis, Chromy’s method determines the total number of hits for the first i units as follows. If upper F Subscript h i Baseline equals 0 or upper F Subscript h left-parenthesis i minus 1 right-parenthesis Baseline greater-than upper F Subscript h i, then upper T Subscript h i Baseline equals upper I Subscript h i. Otherwise, upper T Subscript h i Baseline equals upper I Subscript h i Baseline plus 1 with probability

left-parenthesis upper F Subscript h i Baseline minus upper F Subscript h left-parenthesis i minus 1 right-parenthesis Baseline right-parenthesis slash left-parenthesis 1 minus upper F Subscript h left-parenthesis i minus 1 right-parenthesis Baseline right-parenthesis

And the number of hits for unit i is upper T Subscript h i Baseline minus upper T Subscript h left-parenthesis i minus 1 right-parenthesis.

If upper T Subscript h left-parenthesis i minus 1 right-parenthesis Baseline equals left-parenthesis upper I Subscript h left-parenthesis i minus 1 right-parenthesis Baseline plus 1 right-parenthesis, Chromy’s method determines the total number of hits for the first i units as follows. If upper F Subscript h i Baseline equals 0, then upper T Subscript h i Baseline equals upper I Subscript h i. If upper F Subscript h i Baseline greater-than upper F Subscript h left-parenthesis i minus 1 right-parenthesis, then upper T Subscript h i Baseline equals upper I Subscript h i Baseline plus 1. Otherwise, upper T Subscript h i Baseline equals upper I Subscript h i Baseline plus 1 with probability

upper F Subscript h i Baseline slash upper F Subscript h left-parenthesis i minus 1 right-parenthesis
Last updated: December 09, 2022