When you specify the METHOD=PPS option, PROC SURVEYSELECT selects units with probability proportional to size and without replacement. The selection probability for unit i in stratum h is , where
is the sample size for stratum h and
is the relative size of unit i in stratum h. The relative size
is computed as
, which is the ratio of the size measure of unit i in stratum h to the total of all size measures in stratum h.
Because selection probabilities cannot exceed 1, the relative size for each unit must not exceed for METHOD=PPS. This requirement can be expressed as
, or equivalently as
. If your size measures do not meet this requirement, you can adjust the size measures by using the MAXSIZE= or MINSIZE= option. Or you can request certainty selection for the larger units by using the CERTSIZE= or CERTSIZE=P= option. Alternatively, you can use a selection method that does not have this relative size restriction, such as PPS with minimum replacement (METHOD=PPS_SEQ).
PROC SURVEYSELECT performs PPS selection by using the Hanurav-Vijayan algorithm. Hanurav (1967) introduced this algorithm for the selection of two units per stratum, and Vijayan (1968) generalized it for the selection of more than two units. This algorithm enables computation of joint selection probabilities and provides joint selection probability values that usually ensure nonnegativity and stability of the Sen-Yates-Grundy variance estimator. For more information, see Fox (1989), Golmant (1990), and Watts (1991).
The notation in the remainder of this section drops the stratum subscript h for simplicity. If you specify a stratified design, n now denotes the sample size for the current stratum, N denotes the stratum population size, denotes the size measure for unit i in the stratum, and M denotes the total of size measures in the stratum. For a stratified design, PROC SURVEYSELECT selects samples independently within strata by using the same selection method in each stratum.
PROC SURVEYSELECT performs the Hanurav-Vijayan selection algorithm as described by Fox (1989, p. 169). For the definition of , see Golmant (1990). The sampling units are first sorted in ascending order by size measure so that
. The procedure then selects a PPS sample of n units as follows:
The procedure randomly chooses one of the integers with probability
, where
If the integer i is selected in step 1, the procedure includes the last units in the sample (where the units are ordered by their size measures). The procedure then selects the remaining i units by following steps 3 through 6.
The procedure defines new normed size measures for the remaining units that were not selected in steps 1 and 2:
The procedure selects the next unit from the first units with probability proportional to
, where
and
Where denotes the unit that is selected in step 4, the procedure selects the next unit from units
through
with probability proportional to
, where
The procedure repeats step 5 until all n sample units are selected.
If you specify the JTPROBS option, PROC SURVEYSELECT computes the joint selection probabilities for all pairs of selected units in each stratum. The joint selection probability for units i and j is
where