The SURVEYFREQ Procedure

Jackknife Method

When you specify the VARMETHOD=JACKKNIFE option, PROC SURVEYFREQ uses the delete-1 jackknife method for variance estimation. The jackknife method can be used for stratified sample designs and for designs with no stratification. If your design is stratified, the jackknife method requires at least two PSUs in each stratum. You can provide replicate weights for jackknife variance estimation by using a REPWEIGHTS statement, or the procedure can construct replicate weights for the analysis. PROC SURVEYFREQ estimates the parameter of interest (a proportion, total, odds ratio, or other statistic) from each replicate, and then uses the variability among replicate estimates to estimate the overall variance of the parameter estimate. For more information about jackknife variance estimation, see Wolter (1985) and Lohr (2010).

Replicate Weight Construction

If you do not provide replicate weights by using a REPWEIGHTS statement, PROC SURVEYFREQ constructs the replicates. The number of replicates R is the number of PSUs, and the procedure deletes one PSU from the full sample to form each replicate. The sampling weights are modified by the jackknife coefficient for the replicate to create the replicate weights.

If your design is not stratified (no STRATA statement), the jackknife coefficient has the same value for each replicate r. The jackknife coefficient is

alpha Subscript r Baseline equals left-parenthesis upper R minus 1 right-parenthesis slash upper R normal f normal o normal r r equals 1 comma 2 comma ellipsis comma upper R

where R is the total number of replicates (or total number of PSUs). For the PSUs included in a replicate, the replicate weights are computed by dividing the original sampling weights by the jackknife coefficient. For the deleted PSU, which is not included in the replicate, the replicate weights equal 0. The replicate weight for the jth member of the ith PSU can be expressed as follows when the design is not stratified:

upper W Subscript i j Superscript r Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column upper W Subscript i j Baseline slash alpha Subscript r Baseline 2nd Column if PSU i is included in replicate r 2nd Row 1st Column 0 2nd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout

where upper W Subscript i j is the original sampling weight of unit left-parenthesis i j right-parenthesis, r is the replicate number, and alpha Subscript r is the jackknife coefficient.

If your design is stratified, the jackknife method requires at least two PSUs in each stratum. Let stratum h prime Subscript r be the stratum from which a PSU is deleted to form the rth replicate. Stratum h prime Subscript r is called the donor stratum. The jackknife coefficients are defined as

alpha Subscript r Baseline equals left-parenthesis n Subscript h prime Sub Subscript r Baseline minus 1 right-parenthesis slash n Subscript h prime Sub Subscript r Baseline normal f normal o normal r r equals 1 comma 2 comma ellipsis comma upper R

where n Subscript h prime Sub Subscript r is the total number of PSUs in the donor stratum for replicate r. For all strata other than the donor stratum, the replicate r weights equal the original sampling weights. For PSUs included from the donor stratum, the replicate weights are computed by dividing the original sampling weights by the jackknife coefficient. For the deleted PSU, which is not included in the replicate, the replicate weights equal 0. The replicate weight for the jth member of the ith PSU in stratum h can be expressed as

upper W Subscript h i j Superscript r Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column upper W Subscript h i j Baseline 2nd Column normal i normal f h not-equals h Subscript r Superscript prime Baseline 2nd Row 1st Column upper W Subscript h i j Baseline slash alpha Subscript r Baseline 2nd Column normal i normal f h equals h Subscript r Superscript prime Baseline and PSU left-parenthesis h i right-parenthesis is included in replicate r 3rd Row 1st Column 0 2nd Column normal i normal f h equals h Subscript r Superscript prime Baseline and PSU left-parenthesis h i right-parenthesis is not included in replicate r EndLayout

You can use the OUTWEIGHTS= method-option to store the replicate weights in a SAS data set. You can also use the OUTJKCOEFS= method-option to store the jackknife coefficients in a SAS data set. For information about the contents of these output data sets, see the sections Replicate Weight Output Data Set and Jackknife Coefficient Output Data Set. You can provide replicate weights and jackknife coefficients to the procedure for subsequent analyses by using a REPWEIGHTS statement. If you provide replicate weights but do not provide jackknife coefficients, PROC SURVEYFREQ uses alpha Subscript r Baseline equals left-parenthesis upper R minus 1 right-parenthesis slash upper R as the jackknife coefficient for all replicates.

Variance Estimation

Let theta denote the population parameter to be estimated—for example, a proportion, total, odds ratio, or other statistic. Let ModifyingAbove theta With caret denote the estimate of theta from the full sample, and let ModifyingAbove theta With caret Subscript r be the estimate from the rth jackknife replicate, which is computed by using the replicate weights. The jackknife variance estimate for ModifyingAbove theta With caret is computed as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove theta With caret right-parenthesis equals sigma-summation Underscript r equals 1 Overscript upper R Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove theta With caret Subscript r Baseline minus ModifyingAbove theta With caret right-parenthesis squared

where R is the total number of replicates and alpha Subscript r is the jackknife coefficient for replicate r.

If you specify the CENTER=REPLICATES method-option, the jackknife variance estimate is computed as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove theta With caret right-parenthesis equals sigma-summation Underscript r equals 1 Overscript upper R Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove theta With caret Subscript r Baseline minus theta overbar right-parenthesis squared

where theta overbar is the average of the replicate estimates and is computed as follows:

theta overbar equals StartFraction 1 Over upper R EndFraction sigma-summation Underscript r equals 1 Overscript upper R Endscripts ModifyingAbove theta With caret Subscript r

If a parameter cannot be estimated from one or more replicates, the variance estimate is computed by using those replicates from which the parameter can be estimated. For example, suppose the parameter is a column proportion—the proportion of column j for table cell (i, j). If a replicate r contains no observations in column j, then the column j proportion is not estimable from replicate r. In this case, the jackknife variance estimate is computed as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove theta With caret right-parenthesis equals StartFraction upper R Over upper R prime EndFraction sigma-summation Underscript r equals 1 Overscript upper R Superscript prime Baseline Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove theta With caret Subscript r Baseline minus ModifyingAbove theta With caret right-parenthesis squared

where the summation is over the replicates for which the parameter theta is estimable and where upper R prime is the number of those replicates.

Last updated: December 09, 2022