The SURVEYSELECT Procedure

PPS Sampling without Replacement

When you specify the METHOD=PPS option, PROC SURVEYSELECT selects units with probability proportional to size and without replacement. The selection probability for unit i in stratum h is , where is the sample size for stratum h and is the relative size of unit i in stratum h. The relative size is computed as , which is the ratio of the size measure of unit i in stratum h to the total of all size measures in stratum h.

Because selection probabilities cannot exceed 1, the relative size for each unit must not exceed for METHOD=PPS. This requirement can be expressed as , or equivalently as . If your size measures do not meet this requirement, you can adjust the size measures by using the MAXSIZE= or MINSIZE= option. Or you can request certainty selection for the larger units by using the CERTSIZE= or CERTSIZE=P= option. Alternatively, you can use a selection method that does not have this relative size restriction, such as PPS with minimum replacement (METHOD=PPS_SEQ).

PROC SURVEYSELECT performs PPS selection by using the Hanurav-Vijayan algorithm. Hanurav (1967) introduced this algorithm for the selection of two units per stratum, and Vijayan (1968) generalized it for the selection of more than two units. This algorithm enables computation of joint selection probabilities and provides joint selection probability values that usually ensure nonnegativity and stability of the Sen-Yates-Grundy variance estimator. For more information, see Fox (1989), Golmant (1990), and Watts (1991).

The notation in the remainder of this section drops the stratum subscript h for simplicity. If you specify a stratified design, n now denotes the sample size for the current stratum, N denotes the stratum population size, denotes the size measure for unit i in the stratum, and M denotes the total of size measures in the stratum. For a stratified design, PROC SURVEYSELECT selects samples independently within strata by using the same selection method in each stratum.

PROC SURVEYSELECT performs the Hanurav-Vijayan selection algorithm as described by Fox (1989, p. 169). For the definition of , see Golmant (1990). The sampling units are first sorted in ascending order by size measure so that . The procedure then selects a PPS sample of n units as follows:

The procedure randomly chooses one of the integers with probability , where

where and

By definition, to ensure that .
If the integer i is selected in step 1, the procedure includes the last units in the sample (where the units are ordered by their size measures). The procedure then selects the remaining i units by following steps 3 through 6.
The procedure defines new normed size measures for the remaining units that were not selected in steps 1 and 2:
The procedure selects the next unit from the first units with probability proportional to , where

and
Where denotes the unit that is selected in step 4, the procedure selects the next unit from units through with probability proportional to , where
The procedure repeats step 5 until all n sample units are selected.

If you specify the JTPROBS option, PROC SURVEYSELECT computes the joint selection probabilities for all pairs of selected units in each stratum. The joint selection probability for units i and j is

upper P Subscript left-parenthesis i j right-parenthesis Baseline equals sigma-summation Underscript r equals 1 Overscript n Endscripts theta Subscript r Baseline upper K Subscript i j Superscript left-parenthesis r right-parenthesis

where

upper K Subscript i j Superscript left-parenthesis r right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column upper N minus n plus r less-than i less-than-or-equal-to upper N minus 1 2nd Row 1st Column r upper Z Subscript upper N minus n plus 1 Baseline slash left-parenthesis upper T plus r upper Z Subscript upper N minus n plus 1 Baseline right-parenthesis 2nd Column upper N minus n less-than i less-than-or-equal-to upper N minus n plus r comma j greater-than upper N minus n plus r 3rd Row 1st Column r upper Z Subscript i Baseline slash left-parenthesis upper T plus r upper Z Subscript upper N minus n plus 1 Baseline right-parenthesis 2nd Column 1 less-than-or-equal-to i less-than-or-equal-to upper N minus n comma j greater-than upper N minus n plus r 4th Row 1st Column pi Subscript i j Superscript left-parenthesis r right-parenthesis Baseline 2nd Column j less-than-or-equal-to upper N minus n plus r EndLayout

pi Subscript i j Superscript left-parenthesis r right-parenthesis Baseline equals r left-parenthesis r minus 1 right-parenthesis upper P Subscript i Superscript left-parenthesis r right-parenthesis Baseline upper Z Subscript j Superscript asterisk Baseline left-parenthesis r right-parenthesis product Underscript k equals 1 Overscript i minus 1 Endscripts left-parenthesis 1 minus upper P Subscript k Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis

upper P Subscript k Superscript left-parenthesis r right-parenthesis Baseline equals upper M Subscript k Baseline slash left-parenthesis upper M Subscript k plus 1 Baseline plus upper M Subscript k plus 2 Baseline plus midline-horizontal-ellipsis plus upper M Subscript upper N minus n plus r Baseline right-parenthesis

Last updated: December 09, 2022