The SURVEYMEANS Procedure

Statistical Computations

The SURVEYMEANS procedure uses the Taylor series (linearization) method or replication (resampling) methods to estimate sampling errors of estimators based on complex sample designs. For more information, see Fuller (2009); Wolter (2007); Lohr (2010); Kalton (1983); Hidiroglou, Fuller, and Hickman (1980); Fuller et al. (1989); Lee, Forthofer, and Lorimor (1989); Cochran (1977); Kish (1965); Hansen, Hurwitz, and Madow (1953); Rust (1985); Dippo, Fay, and Morganstein (1984); Rao and Shao (1999); Rao, Wu, and Yue (1992); Rao and Shao (1996). You can use the VARMETHOD= option to specify a variance estimation method to use. By default, the Taylor series method is used.

The Taylor series method obtains a linear approximation for the estimator and then uses the variance estimate for this approximation to estimate the variance of the estimate itself (Woodruff 1971; Fuller 1975). When there are clusters, or PSUs, in the sample design, the procedure estimates variance from the variation among PSUs. When the design is stratified, the procedure pools stratum variance estimates to compute the overall variance estimate. For t tests of the estimates, the degrees of freedom equals the number of clusters minus the number of strata in the sample design.

For a multistage sample design, the Taylor series estimation depends only on the first stage of the sample design. Therefore, the required input includes only first-stage cluster (PSU) and first-stage stratum identification. You do not need to input design information about any additional stages of sampling. This variance estimation method assumes that the first-stage sampling fraction is small, or that the first-stage sample is drawn with replacement, as it often is in practice.

Quite often in complex surveys, respondents have unequal weights, which reflect unequal selection probabilities and adjustments for nonresponse. In such surveys, the appropriate sampling weights must be used to obtain valid estimates for the study population.

However, replication methods have recently gained popularity for estimating variances in complex survey data analysis. One reason for this popularity is the relative simplicity of replication-based estimates, especially for nonlinear estimators; another is that modern computational capacity has made replication methods feasible for practical survey analysis.

Replication methods draw multiple replicates (also called subsamples) from a full sample according to a specific resampling scheme. The most commonly used resampling schemes are the balanced repeated replication (BRR) method, the jackknife method, and the bootstrap method. For each replicate, the original weights are modified for the PSUs in the replicates to create replicate weights. The population parameters of interest are estimated by using the replicate weights for each replicate. Then the variances of parameters of interest are estimated by the variability among the estimates derived from these replicates. You can use a REPWEIGHTS statement to provide your own replicate weights for variance estimation. For more information about using replication methods to analyze sample survey data, see the section Replication Methods for Variance Estimation.

Definitions and Notation

For a stratified clustered sample design, together with the sampling weights, the sample can be represented by an matrix

where

is the stratum index
is the cluster index within stratum h
is the unit index within cluster i of stratum h
is the analysis variable number, with a total of P variables
is the total number of observations in the sample
denotes the sampling weight for unit j in cluster i of stratum h
are the observed values of the analysis variables for unit j in cluster i of stratum h, including both the values of numerical variables and the values of indicator variables for levels of categorical variables.

For a categorical variable C, let l denote the number of levels of C, and denote the level values as . Let be an indicator variable for the category with the observed value in unit j in cluster i of stratum h:

y Subscript h i j Superscript left-parenthesis q right-parenthesis Baseline equals upper I Subscript StartSet upper C equals c Sub Subscript k Subscript EndSet Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if upper C Subscript h i j Baseline equals c Subscript k Baseline 2nd Row 1st Column 0 2nd Column otherwise EndLayout

Note that the indicator variable is set to missing when is missing. Therefore, the total number of analysis variables, P, is the total number of numerical variables plus the total number of levels of all categorical variables.

The sampling rate for stratum h, which is used in Taylor series and bootstrap variance estimation, is the fraction of first-stage units (PSUs) that are selected for the sample. You can use the TOTAL= or RATE= option to input population totals or sampling rates. For more information, see the section Specification of Population Totals and Sampling Rates. If you input stratum totals, PROC SURVEYMEANS computes as the ratio of the stratum sample size to the stratum total. If you input stratum sampling rates, PROC SURVEYMEANS uses these values directly for . If you do not specify the TOTAL= or RATE= option, then the procedure assumes that the stratum sampling rates are negligible, and a finite population correction is not used when computing variances. Replication methods that are specified by the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option do not use this finite population correction .

Mean

When you specify the keyword MEAN, the procedure computes the estimate of the mean (mean per element) from the survey data. Also, the procedure computes the mean by default if you do not specify any statistic-keywords in the PROC SURVEYMEANS statement.

PROC SURVEYMEANS computes the estimate of the mean as

ModifyingAbove Above upper Y overbar With caret equals left-parenthesis sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline y Subscript h i j Baseline right-parenthesis slash w Subscript dot dot dot Baseline

where

w Subscript dot dot dot Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j

is the sum of the weights over all observations in the sample.

Variance and Standard Error of the Mean

When you specify the keyword STDERR, the procedure computes the standard error of the mean. Also, the procedure computes the standard error by default if you specify the keyword MEAN, or if you do not specify any statistic-keywords in the PROC SURVEYMEANS statement. The keyword VAR requests the variance of the mean.

Taylor Series Method

When you use VARMETHOD=TAYLOR, or by default if you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses the Taylor series method to estimate the variance of the mean . The procedure computes the estimated variance as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis

where, if , then

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis 2nd Column equals 3rd Column StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts left-parenthesis e Subscript h i dot Baseline minus e overbar Subscript h dot dot Baseline right-parenthesis squared 2nd Row 1st Column e Subscript h i dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline left-parenthesis y Subscript h i j Baseline minus ModifyingAbove Above upper Y overbar With caret right-parenthesis right-parenthesis slash w Subscript dot dot dot Baseline 3rd Row 1st Column e overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts e Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

Replication Methods

When you specify the VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, or VARMETHOD=JACKKNIFE option, the procedure computes the variance with replication methods by using the variability among replicate estimates to estimate the overall variance. See the section Replication Methods for Variance Estimation for more details.

Standard Error

The standard error of the mean is the square root of the estimated variance.

StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis equals StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis EndRoot

t Test for the Mean

If you specify the keyword T, PROC SURVEYMEANS computes the t-value for testing that the population mean equals zero, . The test statistic equals

t left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis equals ModifyingAbove Above upper Y overbar With caret slash StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis

The two-sided p-value for this test is

Prob left-parenthesis StartAbsoluteValue upper T EndAbsoluteValue greater-than StartAbsoluteValue t left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis EndAbsoluteValue right-parenthesis

where T is a random variable with the t distribution with df degrees of freedom.

Degrees of Freedom

PROC SURVEYMEANS computes the degrees of freedom df to obtain the t distribution’s percentile to construct the % confidence limits for means, proportions, totals, ratios, and other statistics, and for hypothesis testing. The degrees of freedom computation depends on the variance estimation method that you request. Missing values can affect the degrees of freedom computation. For more information, see the section Missing Values.

Taylor Series Variance Estimation

For the Taylor series method, PROC SURVEYMEANS calculates the degrees of freedom for the t test as the number of clusters minus the number of strata. If there are no clusters, then the degrees of freedom equals the number of observations minus the number of strata. If the design is not stratified, then the degrees of freedom equals the number of PSUs minus one.

If all observations in a stratum are excluded from the analysis due to missing values, then that stratum is called an empty stratum. Empty strata are not counted in the total number of strata for the table. Similarly, empty clusters and missing observations are not included in the total counts of cluster and observations that are used to compute the degrees of freedom for the analysis.

If you specify the MISSING option, missing values are treated as valid nonmissing levels for a categorical variable and are included in computing degrees of freedom. If you specify the NOMCAR option for Taylor series variance estimation, observations with missing values for an analysis variable are included in computing degrees of freedom.

Replication Variance Estimation with Replicate Weights

When a REPWEIGHTS statement is specified, PROC SURVEYMEANS computes the degrees of freedom as the number of REPWEIGHTS variables, unless you specify an alternative in the DF= option in a REPWEIGHTS statement.

Replication Variance Estimation without Replicate Weights

For replication variance estimation when no REPWEIGHT statement is specified, PROC SURVEYMEANS by default computes the degrees of freedom by using all valid observations in the input data set. A valid observation is an observation that has a positive value of the variable that you specify in a WEIGHT statement. A valid observation also must have nonmissing values for the variables that are specified in STRATA, CLUSTER, and POSTSTRATA statements unless you specify the MISSING option. For more information about valid observations, see the section Data and Sample Design Summary.

For VARMETHOD=BRR variance estimation (including Fay’s method) without a REPWEIGHTS statement, the degrees of freedom equals the number of strata.

For the VARMETHOD=JACKKNIFE and the VARMETHOD=BOOTSTRAP variance estimation methods without a REPWEIGHTS statement, PROC SURVEYMEANS calculates the degrees of freedom the same as it does when VARMETHOD=TAYLOR.

When you specify the DFADJ option, the procedure computes the degrees of freedom from the number of nonmissing strata and clusters for an analysis variable. This excludes any empty strata or clusters that occur when observations that have missing values of an analysis variable are removed.

The procedure displays the degrees of freedom for the t test if you specify the keyword DF in the PROC SURVEYMEANS statement.

Confidence Limits for the Mean

If you specify the keyword CLM, the procedure computes two-sided confidence limits for the mean. Also, the procedure includes the confidence limits by default if you do not specify any statistic-keywords in the PROC SURVEYMEANS statement.

The confidence coefficient is determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95% confidence limits. The confidence limits are computed as

ModifyingAbove Above upper Y overbar With caret plus-or-minus StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis t Subscript d f comma alpha slash 2 Baseline

where is the estimate of the mean, is the standard error of the mean, and is the th percentile of the t distribution with df calculated as in the section t Test for the Mean.

If you specify the keyword UCLM, the procedure computes the one-sided upper % confidence limit for the mean:

ModifyingAbove Above upper Y overbar With caret plus StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis t Subscript d f comma alpha Baseline

If you specify the keyword LCLM, the procedure computes the one-sided lower % confidence limit for the mean:

ModifyingAbove Above upper Y overbar With caret minus StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis t Subscript d f comma alpha Baseline

Coefficient of Variation

If you specify the keyword CV, PROC SURVEYMEANS computes the coefficient of variation, which is the ratio of the standard error of the mean to the estimated mean:

normal c normal v left-parenthesis upper Y overbar right-parenthesis equals StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret right-parenthesis slash ModifyingAbove Above upper Y overbar With caret

If you specify the keyword CVSUM, PROC SURVEYMEANS computes the coefficient of variation for the estimated total, which is the ratio of the standard deviation of the sum to the estimated total:

normal c normal v left-parenthesis upper Y right-parenthesis equals Std left-parenthesis ModifyingAbove upper Y With caret right-parenthesis slash ModifyingAbove upper Y With caret

Proportions

If you specify the keyword MEAN for a categorical variable, PROC SURVEYMEANS estimates the proportion, or relative frequency, for each level of the categorical variable. If you do not specify any statistic-keywords in the PROC SURVEYMEANS statement, the procedure estimates the proportions for levels of the categorical variables, together with their standard errors and confidence limits.

The procedure estimates the proportion in level for variable C as

ModifyingAbove p With caret equals StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline y Subscript h i j Superscript left-parenthesis q right-parenthesis Baseline Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline EndFraction

where is the value of the indicator function for level , defined in the section Definitions and Notation, and equals 1 if the observed value of variable C equals , and equals 0 otherwise. Since the proportion estimator is actually an estimator of the mean for an indicator variable, the procedure computes its variance and standard error according to the method outlined in the section Variance and Standard Error of the Mean. Similarly, the procedure computes confidence limits for proportions as in the section Confidence Limits for the Mean.

Total

If you specify the keyword SUM, the procedure computes the estimate of the population total from the survey data. The estimate of the total is the weighted sum over the sample:

ModifyingAbove upper Y With caret equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline y Subscript h i j Baseline

For a categorical variable level, estimates its total frequency in the population.

Variance and Standard Deviation of the Total

When you specify the keyword STD or the keyword SUM, the procedure estimates the standard deviation of the total. The keyword VARSUM requests the variance of the total.

Taylor Series Method

When you use VARMETHOD=TAYLOR, or by default, PROC SURVEYMEANS uses the Taylor series method to estimate the variance of the total as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Y With caret right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret right-parenthesis

where, if , then

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret right-parenthesis 2nd Column equals 3rd Column StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Endscripts left-parenthesis y Subscript h i dot Baseline minus y overbar Subscript h dot dot Baseline right-parenthesis squared 2nd Row 1st Column y Subscript h i dot 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline y Subscript h i j Baseline 3rd Row 1st Column y overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts y Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

Replication Methods

When you specify the VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, or VARMETHOD=JACKKNIFE option, the procedure computes the variance by using replication methods and measuring the variability among the estimates derived from these replicates. For more information, see the section Replication Methods for Variance Estimation.

Standard Deviation

The standard deviation of the total equals

Std left-parenthesis ModifyingAbove upper Y With caret right-parenthesis equals StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Y With caret right-parenthesis EndRoot

Confidence Limits for the Total

If you specify the keyword CLSUM, the procedure computes confidence limits for the total. The confidence coefficient is determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95% confidence limits. The confidence limits are computed as

ModifyingAbove upper Y With caret plus-or-minus Std left-parenthesis ModifyingAbove upper Y With caret right-parenthesis t Subscript d f comma alpha slash 2 Baseline

where is the estimate of the total, is the estimated standard deviation, and is the th percentile of the t distribution with df calculated as described in the section t Test for the Mean.

If you specify the keyword UCLSUM, the procedure computes the one-sided upper % confidence limit for the sum:

ModifyingAbove upper Y With caret plus Std left-parenthesis ModifyingAbove upper Y With caret right-parenthesis t Subscript d f comma alpha Baseline

If you specify the keyword LCLSUM, the procedure computes the one-sided lower % confidence limit for the sum:

ModifyingAbove upper Y With caret minus Std left-parenthesis ModifyingAbove upper Y With caret right-parenthesis t Subscript d f comma alpha Baseline

Design Effect

If you specify the keyword DEFF in the PROC SURVEYMEANS statement, the procedure calculates the design effects for the mean. The design effect is the ratio of the variance computed under the sample design to the variance computed under the assumption of simple random sampling:

DEFF equals StartFraction variance under the sample design Over variance under simple random sampling EndFraction

For more information, see Kish (1965, p. 258).

In the numerator, PROC SURVEYMEANS computes the variance of the mean according to the method outlined in the section Variance and Standard Error of the Mean. And the denominator is computed under the assumption that the sample design is simple random sampling, with no stratification and no clustering.

For Taylor series or bootstrap variance estimation, PROC SURVEYMEANS computes the overall sampling fraction in the simple random sampling variance by using the value of the RATE= or TOTAL= option.

If you do not specify either of these options, PROC SURVEYMEANS assumes that the value of is negligible and does not use a finite population correction in the analysis, as described in the section Specification of Population Totals and Sampling Rates.

If you specify RATE=value, PROC SURVEYMEANS uses this value as the overall sampling fraction . If you specify TOTAL=value, PROC SURVEYMEANS computes as the ratio of the number of PSUs in the sample to the specified total.

If you specify stratum sampling rates by using the RATE=SAS-data-set option, then PROC SURVEYMEANS computes stratum totals based on these stratum sampling rates and the number of sample PSUs in each stratum. The procedure sums the stratum totals to form the overall total, and it computes as the ratio of the number of sample PSUs to the overall total. Alternatively, if you specify stratum totals by using the TOTAL=SAS-data-set option, then PROC SURVEYMEANS sums these totals to compute the overall total. The overall sampling fraction is then computed as the ratio of the number of sample PSUs to the overall total.

Ratio

When you use a RATIO statement, the procedure produces statistics requested by the statistic-keywords in the PROC SURVEYMEANS statement.

Suppose that you want to calculate the ratio of variable Y to variable X. Let be the value of variable X for the jth member in cluster i in the hth stratum.

The ratio of Y to X is

ModifyingAbove upper R With caret equals StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline y Subscript h i j Baseline slash sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline x Subscript h i j Baseline slash sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline EndFraction

PROC SURVEYMEANS uses the Taylor series method to estimate the variance of the ratio as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper R With caret right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper R With caret right-parenthesis

where, if , then

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper R With caret right-parenthesis 2nd Column equals 3rd Column StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts left-parenthesis g Subscript h i dot Baseline minus g overbar Subscript h dot dot Baseline right-parenthesis squared 2nd Row 1st Column g Subscript h i dot 2nd Column equals 3rd Column StartFraction sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline left-parenthesis y Subscript h i j Baseline minus x Subscript h i j Baseline ModifyingAbove upper R With caret right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline x Subscript h i j Baseline EndFraction 3rd Row 1st Column g overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts g Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper R With caret right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

The standard error of the ratio is the square root of the estimated variance:

StdErr left-parenthesis ModifyingAbove upper R With caret right-parenthesis equals StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper R With caret right-parenthesis EndRoot

When the denominator for a ratio is zero, then the value of the ratio is displayed as '–Infty', 'Infty', or a missing value, depending on whether the numerator is negative, positive, or zero, respectively; and the corresponding internal value is the special missing value '.M', the special missing value '.I', or the usual missing value, respectively.

Domain Statistics

When you use a DOMAIN statement to request a domain analysis, the procedure computes the requested statistics for each domain level.

For a domain D, let be the corresponding indicator variable:

upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to upper D 2nd Row 1st Column 0 2nd Column otherwise EndLayout

Let

z Subscript h i j Baseline equals y Subscript h i j Baseline upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript h i j Baseline 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to upper D 2nd Row 1st Column 0 2nd Column otherwise EndLayout

Let

v Subscript h i j Baseline equals w Subscript h i j Baseline upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column w Subscript h i j Baseline 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to upper D 2nd Row 1st Column 0 2nd Column otherwise EndLayout

The requested statistics for variable y in domain D are computed by using the new weights v.

Note that is set to missing if represents a level of a categorical variable and is missing.

Domain Mean

The estimated mean of Y in the domain D is

ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline equals left-parenthesis sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline y Subscript h i j Baseline right-parenthesis slash v Subscript dot dot dot Baseline

where

v Subscript dot dot dot Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j

When you use the Taylor series method, the variance of is estimated by

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline right-parenthesis

where, if , then

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

If you use replication methods to estimate the variance (by specifying the VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, or VARMETHOD=JACKKNIFE option), PROC SURVEYMEANS estimates the variance of by using the variability among replicate estimates to estimate the overall variance. For more information, see the section Replication Methods for Variance Estimation.

Difference of Domain Means

If you specify the DIFFMEANS option in a DOMAIN statement, PROC SURVEYMEANS compares the means of a continuous analysis variable at different domain levels.

Let D be a domain definition that is specified in a DOMAIN statement. Let be the r levels of D, and let the corresponding indicator variables be

upper I Subscript upper D Sub Subscript k Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to upper D Subscript k Baseline 2nd Row 1st Column 0 2nd Column otherwise EndLayout

for

For variable Y, the difference between the means for domain levels and () can be expressed as

normal upper Delta left-parenthesis upper Y comma upper D comma k Baseline 1 comma k Baseline 2 right-parenthesis equals ModifyingAbove Above upper Y overbar With caret Subscript upper D Sub Subscript k Baseline 1 Baseline minus ModifyingAbove Above upper Y overbar With caret Subscript upper D Sub Subscript k Baseline 1

The estimated variance for this difference is

ModifyingAbove upper V With caret left-parenthesis normal upper Delta left-parenthesis upper Y comma upper D comma k Baseline 1 comma k Baseline 2 right-parenthesis right-parenthesis equals ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Sub Subscript k Baseline 1 Subscript Baseline right-parenthesis plus ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Sub Subscript k Baseline 2 Subscript Baseline right-parenthesis minus 2 ModifyingAbove Cov With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Sub Subscript k Baseline 1 Subscript Baseline comma ModifyingAbove Above upper Y overbar With caret Subscript upper D Sub Subscript k Baseline 2 Subscript Baseline right-parenthesis

where the estimated variances and for means at corresponding domain levels (in addition to the covariance between these two domain means) are described as in the section Domain Mean.

For variable Y, PROC SURVEYMEANS computes the t statistics to test the significance of the difference of the domain means between two levels and as the following ratio with df degrees of freedom:

StartFraction normal upper Delta left-parenthesis upper Y comma upper D comma k Baseline 1 comma k Baseline 2 right-parenthesis Over StartRoot ModifyingAbove upper V With caret left-parenthesis normal upper Delta left-parenthesis upper Y comma upper D comma k Baseline 1 comma k Baseline 2 right-parenthesis right-parenthesis EndRoot EndFraction

For more information about df, see the section Degrees of Freedom. For more information about computing the these domain means and the variance and covariance of domain means with poststratification, see the section Variance of the Domain Mean and Sum. The corresponding t statistics are then computed accordingly.

Domain Total

The estimated total in domain D is

ModifyingAbove upper Y With caret Subscript upper D Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline y Subscript h i j Baseline

and its estimated variance is

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline right-parenthesis

where, if , then

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline right-parenthesis 2nd Column equals 3rd Column StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Endscripts left-parenthesis z Subscript h i dot Baseline minus z overbar Subscript h dot dot Baseline right-parenthesis squared 2nd Row 1st Column z Subscript h i dot 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline z Subscript h i j Baseline 3rd Row 1st Column z overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts z Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

Domain Ratio

The estimated ratio of Y to X in domain D is

ModifyingAbove upper R With caret Subscript upper D Baseline equals StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline y Subscript h i j Baseline Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline x Subscript h i j Baseline EndFraction

and its estimated variance is

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper R With caret Subscript upper D Baseline right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper R With caret Subscript upper D Baseline right-parenthesis

where, if , then

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper R With caret Subscript upper D Baseline right-parenthesis 2nd Column equals 3rd Column StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts left-parenthesis g Subscript h i dot Baseline minus g overbar Subscript h dot dot Baseline right-parenthesis squared 2nd Row 1st Column g Subscript h i dot 2nd Column equals 3rd Column StartFraction sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline left-parenthesis y Subscript h i j Baseline minus x Subscript h i j Baseline ModifyingAbove upper R With caret Subscript upper D Baseline right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline x Subscript h i j Baseline EndFraction 3rd Row 1st Column g overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts g Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper R With caret Subscript upper D Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

For domain analysis with poststratification, see the section Poststratification. For quantile estimation in a domain, see the section Domain Quantile. For quantile estimation in a domain with poststratification, see the section Domain Quantile Estimation with Poststratification.

Quantiles

Let Y be the variable of interest in a complex survey. Denote as the cumulative distribution function of Y. For , the pth quantile of the population cumulative distribution function is

upper Q left-parenthesis p right-parenthesis equals inf left-brace y colon upper F left-parenthesis y right-parenthesis greater-than-or-equal-to p right-brace

Estimate of Quantile

Let be the observed values for variable Y that are associated with sampling weights, where are the stratum index, cluster index, and member index, respectively, as shown in the section Definitions and Notation. Let denote the sample order statistics for variable Y.

An estimate of quantile is

ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if p less-than ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k right-parenthesis Baseline plus StartFraction p minus ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndFraction left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Baseline minus y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis 2nd Column if ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis less-than-or-equal-to p less-than ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis n right-parenthesis Baseline 2nd Column if p equals 1 EndLayout

where is the estimated cumulative distribution for Y,

ModifyingAbove upper F With caret left-parenthesis t right-parenthesis equals StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to t right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline EndFraction

and is the indicator function.

Standard Error

How PROC SURVEYMEANS estimates the standard error of a quantile depends on which variance method you specify in the VARMETHOD= option.

Taylor Series Method

When you specify VARMETHOD=TAYLOR, or by default if you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses Woodruff’s method (Dorfman and Valliant 1993; Särndal, Swensson, and Wretman 1992; Francisco and Fuller 1991) to estimate the variances of quantiles. This method first constructs a confidence interval on a quantile . Then it uses the width of the confidence interval to estimate the standard error of . To construct the confidence interval, PROC SURVEYMEANS first estimates the variance of the estimated distribution function by

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts left-parenthesis e Subscript h i dot Baseline minus e overbar Subscript h dot dot Baseline right-parenthesis squared

where

StartLayout 1st Row 1st Column e Subscript h i dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline left-parenthesis upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis right-parenthesis right-parenthesis slash w Subscript dot dot dot Baseline 2nd Row 1st Column e overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts e Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline 3rd Row 1st Column w Subscript dot dot dot 2nd Column equals 3rd Column sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Endscripts w Subscript h i j EndLayout

Then % confidence limits for can be constructed by

left-parenthesis ModifyingAbove p With caret Subscript upper L Baseline comma ModifyingAbove p With caret Subscript upper U Baseline right-parenthesis equals left-parenthesis ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis minus t Subscript d f comma alpha slash 2 Baseline StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis right-parenthesis EndRoot comma ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis plus t Subscript d f comma alpha slash 2 Baseline StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis right-parenthesis EndRoot right-parenthesis

where is the th percentile of the t distribution with df degrees of freedom, described in the section Degrees of Freedom.

When is out of the range of [0,1], the procedure does not compute the standard error of .

The th quantile is defined as

ModifyingAbove upper Q With caret left-parenthesis ModifyingAbove p With caret Subscript upper L Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper L Baseline less-than ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline plus StartFraction ModifyingAbove p With caret Subscript upper L Baseline minus ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript plus 1 right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis EndFraction left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript plus 1 right-parenthesis Baseline minus y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis 2nd Column if ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis less-than-or-equal-to ModifyingAbove p With caret Subscript upper L Baseline less-than ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis d right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper L Baseline equals 1 EndLayout

and the th quantile is defined as

ModifyingAbove upper Q With caret left-parenthesis ModifyingAbove p With caret Subscript upper U Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper U Baseline less-than ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline plus StartFraction ModifyingAbove p With caret Subscript upper U Baseline minus ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript plus 1 right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis EndFraction left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript plus 1 right-parenthesis Baseline minus y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis 2nd Column if ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis less-than-or-equal-to ModifyingAbove p With caret Subscript upper U Baseline less-than ModifyingAbove upper F With caret left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis d right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper U Baseline equals 1 EndLayout

The standard error of is then estimated by

ModifyingAbove StdErr With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis equals StartFraction ModifyingAbove upper Q With caret left-parenthesis ModifyingAbove p With caret Subscript upper U Baseline right-parenthesis minus ModifyingAbove upper Q With caret left-parenthesis ModifyingAbove p With caret Subscript upper L Baseline right-parenthesis Over 2 t Subscript d f comma alpha slash 2 Baseline EndFraction

where is the th percentile of the t distribution with df degrees of freedom.

Replication Methods

When you use the jackknife replication method as described in the section Replication Methods for Variance Estimation, the following naive replication variance estimate for a quantile can have poor properties:

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis equals sigma-summation Underscript r equals 1 Overscript upper R Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis minus ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis squared

Here is a coefficient that corresponds to each replicate, R is the number of replicates, and is the estimated quantile in the rth replicate.

Fuller (2009) proposes using a smoothing method similar to the following to modify the quantile estimates for each replicate before using this naive variance estimate for a quantile .

In the rth replicate (), denote to be the observed values of variable Y that are associated with the replicate weights. Let be the sample order statistics for variable Y in the rth replicate, where is the total number of observations whose replicate weights .

Let be the estimated pth quantile and be the estimated cumulative distribution for Y,

ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline 2nd Column if p less-than ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline plus StartFraction p minus ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis y Subscript left-parenthesis k right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis y Subscript left-parenthesis k right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis EndFraction left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline minus y Subscript left-parenthesis k right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis 2nd Column if ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis y Subscript left-parenthesis k right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis less-than-or-equal-to p less-than ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis n Sub Superscript left-parenthesis r right-parenthesis Subscript right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline 2nd Column if p equals 1 EndLayout

and

ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis t right-parenthesis equals StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline upper I left-parenthesis y Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline less-than-or-equal-to t right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline EndFraction

where is the indicator function.

Choose a segment in the distribution function by two points, and ,

StartLayout 1st Row 1st Column p overTilde Subscript upper L Superscript left-parenthesis r right-parenthesis 2nd Column equals 3rd Column max left-parenthesis StartFraction w Subscript left-parenthesis 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline EndFraction comma ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis right-parenthesis minus 2 StartRoot p left-parenthesis 1 minus p right-parenthesis slash n Superscript left-parenthesis r right-parenthesis Baseline EndRoot right-parenthesis 2nd Row 1st Column p overTilde Subscript upper U Superscript left-parenthesis r right-parenthesis 2nd Column equals 3rd Column min left-parenthesis 1 comma ModifyingAbove upper F With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis right-parenthesis plus 2 StartRoot p left-parenthesis 1 minus p right-parenthesis slash n Superscript left-parenthesis r right-parenthesis Baseline EndRoot right-parenthesis EndLayout

where is the replicate weight that corresponds to in the rth replicate. Then a modified pth quantile in the rth replicate is defined by

ModifyingAbove upper Q With tilde Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis equals ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p overTilde Subscript upper L Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis plus StartFraction ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p overTilde Subscript upper U Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper Q With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p overTilde Subscript upper L Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis Over p overTilde Subscript upper U Superscript left-parenthesis r right-parenthesis Baseline minus p overTilde Subscript upper L Superscript left-parenthesis r right-parenthesis Baseline EndFraction left-parenthesis p minus p overTilde Subscript upper L Superscript left-parenthesis r right-parenthesis Baseline right-parenthesis

PROC SURVEYMEANS then uses as the quantile estimate in the rth replicate and its deviation from the mean of to estimate the variance of :

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis 2nd Column equals 3rd Column sigma-summation Underscript r equals 1 Overscript upper R Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove upper Q With tilde Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis minus ModifyingAbove Above ModifyingAbove upper Q With tilde With bar left-parenthesis p right-parenthesis right-parenthesis squared 2nd Row 1st Column ModifyingAbove Above ModifyingAbove upper Q With tilde With bar left-parenthesis p right-parenthesis 2nd Column equals 3rd Column StartFraction 1 Over upper R EndFraction sigma-summation Underscript r equals 1 Overscript upper R Endscripts ModifyingAbove upper Q With tilde Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis p right-parenthesis EndLayout

If you want to use the naive replication variance estimates by using instead of the smoothed to estimate the variance of , you can specify the method-option NAIVEQVAR as VARMETHOD=BRR(NAIVEQVAR) or VARMETHOD=JACKKNIFE(NAIVEQVAR).

Confidence Limits

Symmetric % confidence limits are computed as

left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis minus ModifyingAbove StdErr With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis t Subscript d f comma alpha slash 2 Baseline comma ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis plus ModifyingAbove StdErr With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis t Subscript d f comma alpha slash 2 Baseline right-parenthesis

If you specify the NONSYMCL option in the PROC SURVEYMEANS statement, when you use the VARMETHOD=TAYLOR option, the procedure computes % nonsymmetric confidence limits:

left-parenthesis ModifyingAbove upper Q With caret left-parenthesis ModifyingAbove p With caret Subscript upper L Baseline right-parenthesis comma ModifyingAbove upper Q With caret left-parenthesis ModifyingAbove p With caret Subscript upper U Baseline right-parenthesis right-parenthesis

Quantile Estimation with Poststratification

When you specify a POSTSTRATA statement, the quantile estimation and its variance estimation incorporate poststratification. For more information about poststratification, see the section Poststratification.

For a selected sample, let be the poststratum index; let be the population totals for each corresponding poststratum, and let be the indicator variable for the poststratum r that is defined by

upper I Subscript r Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to the r th poststratum 2nd Row 1st Column 0 2nd Column otherwise EndLayout

Denote the total sum of original weights in the sample for each poststratum as

psi Subscript r Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline upper I Subscript r Baseline left-parenthesis h comma i comma j right-parenthesis

Assume that the observation (h, i, j) belongs to the rth poststratum. Then the poststratification weight for the observation (h, i, j) is

w overTilde Subscript h i j Baseline equals w Subscript h i j Baseline StartFraction upper Z Subscript r Baseline Over psi Subscript r Baseline EndFraction

Then the estimated cumulative distribution function of Y, and the estimated pth quantile estimation can be computed as in the section Estimate of Quantile by replacing the original weights, , with the poststratification weights, .

When you specify VARMETHOD=TAYLOR (or by default), the variance of is estimated as in the section Standard Error, except that the variance of the estimated distribution function is computed as follows.

For each poststratum , define

ModifyingAbove theta With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis equals upper Z Subscript r Superscript negative 1 Baseline sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript r Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline left-parenthesis upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis right-parenthesis

where is the indicator function.

Assume that the observation (h, i, j) belongs to the rth poststratum. Let

y overTilde Subscript h i j Baseline equals upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove theta With caret Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis ModifyingAbove upper Q With caret left-parenthesis p right-parenthesis right-parenthesis

PROC SURVEYMEANS estimates the variance of the estimated distribution function with poststratification by

where

StartLayout 1st Row 1st Column u Subscript h i dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w overTilde Subscript h i j Baseline y overTilde Subscript h i j Baseline right-parenthesis slash w overTilde Subscript dot dot dot Baseline 2nd Row 1st Column u overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts u Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline 3rd Row 1st Column w overTilde Subscript dot dot dot 2nd Column equals 3rd Column sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Endscripts w overTilde Subscript h i j EndLayout

Domain Quantile

Let Y be the variable of interest in a complex survey, and let a subpopulation of interest be domain D. Denote as the cumulative distribution function of Y in domain D. For , the pth quantile of the population cumulative distribution function is

upper Q Subscript upper D Baseline left-parenthesis p right-parenthesis equals inf left-brace y colon upper F Subscript upper D Baseline left-parenthesis y right-parenthesis greater-than-or-equal-to p right-brace

Let be the corresponding indicator variable:

Assume that there are a total of d observations among the n observations in the entire sample that belong to domain D. Let denote the order statistics of variable Y for these d observations that fall in domain D.

The cumulative distribution function of Y in domain D is estimated by

ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis t right-parenthesis equals StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to t right-parenthesis upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis EndFraction

and is the indicator function. Then the estimated quantile in domain D is

ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if p less-than ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k right-parenthesis Baseline plus StartFraction p minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndFraction left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Baseline minus y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis 2nd Column if ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis less-than-or-equal-to p less-than ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis d right-parenthesis Baseline 2nd Column if p equals 1 EndLayout

In order to estimate the variance for , PROC SURVEYMEANS first estimates the variance of the estimated distribution function in domain D. When you specify VARMETHOD=TAYLOR (or by default), the variance of is estimated by

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts left-parenthesis d Subscript h i dot Baseline minus d overbar Subscript h dot dot Baseline right-parenthesis squared

where

StartLayout 1st Row 1st Column v Subscript h i j 2nd Column equals 3rd Column upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis w Subscript h i j 2nd Row 1st Column v Subscript dot dot dot 2nd Column equals 3rd Column sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Endscripts v Subscript h i j 3rd Row 1st Column d Subscript h i dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v Subscript h i j Baseline left-parenthesis upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis right-parenthesis right-parenthesis slash v Subscript dot dot dot Baseline 4th Row 1st Column d overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts d Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

Then % confidence limits for can be constructed by , where

StartLayout 1st Row 1st Column ModifyingAbove p With caret Subscript upper D upper L 2nd Column equals 3rd Column ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis minus t Subscript d f comma alpha slash 2 Baseline StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis right-parenthesis EndRoot 2nd Row 1st Column ModifyingAbove p With caret Subscript upper D upper U 2nd Column equals 3rd Column ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis plus t Subscript d f comma alpha slash 2 Baseline StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis right-parenthesis EndRoot EndLayout

and is the th percentile of the t distribution with df degrees of freedom, described in the section Degrees of Freedom. When is out of the range of [0,1], PROC SURVEYMEANS does not compute the standard error of .

The th quantile is then estimated as

ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis ModifyingAbove p With caret Subscript upper D upper L Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper D upper L Baseline less-than ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline plus StartFraction left-parenthesis ModifyingAbove p With caret Subscript upper D upper L Baseline minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis right-parenthesis left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript plus 1 right-parenthesis Baseline minus y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript plus 1 right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis EndFraction 2nd Column if ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript right-parenthesis Baseline right-parenthesis less-than-or-equal-to ModifyingAbove p With caret Subscript upper D upper L Baseline less-than ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper L Subscript plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis d right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper D upper L Baseline equals 1 EndLayout

The th quantile is then estimated as

ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis ModifyingAbove p With caret Subscript upper D upper U Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column y Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper D upper U Baseline less-than ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis 2nd Row 1st Column y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline plus StartFraction left-parenthesis ModifyingAbove p With caret Subscript upper D upper U Baseline minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis right-parenthesis left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript plus 1 right-parenthesis Baseline minus y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis Over ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript plus 1 right-parenthesis Baseline right-parenthesis minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis EndFraction 2nd Column if ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript right-parenthesis Baseline right-parenthesis less-than-or-equal-to ModifyingAbove p With caret Subscript upper D upper U Baseline less-than ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis y Subscript left-parenthesis k Sub Subscript upper U Subscript plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column y Subscript left-parenthesis d right-parenthesis Baseline 2nd Column if ModifyingAbove p With caret Subscript upper D upper U Baseline equals 1 EndLayout

The standard error of is then estimated by

ModifyingAbove StdErr With caret left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis equals StartFraction ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis ModifyingAbove p With caret Subscript upper D upper U Baseline right-parenthesis minus ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis ModifyingAbove p With caret Subscript upper D upper L Baseline right-parenthesis Over 2 t Subscript d f comma alpha slash 2 Baseline EndFraction

where is the th percentile of the t distribution with df degrees of freedom.

Symmetric % confidence limits for are computed as

left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis minus ModifyingAbove StdErr With caret left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis t Subscript d f comma alpha slash 2 Baseline comma ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis plus ModifyingAbove StdErr With caret left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis t Subscript d f comma alpha slash 2 Baseline right-parenthesis

If you specify the NONSYMCL option in the PROC SURVEYMEANS statement, the procedure displays % nonsymmetric confidence limits as

left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis ModifyingAbove p With caret Subscript upper D upper L Baseline right-parenthesis comma ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis ModifyingAbove p With caret Subscript upper D upper U Baseline right-parenthesis right-parenthesis

Domain Quantile Estimation with Poststratification

When you specify both a POSTSTRATA statement and a DOMAIN statement, the domain quantile estimation and its variance estimation incorporate poststratification. For more information about poststratification, see the section Poststratification.

For a selected sample, let be the poststratum index, let be the population totals for each corresponding poststratum, and let be the indicator variable for the poststratum r:

The poststratification weights, , are defined as in the section Quantile Estimation with Poststratification.

For domain D, let be the corresponding indicator variable:

With poststratification, for variable Y, the estimated cumulative distribution in domain D, , and its pth quantile estimation, , can be computed as in the section Domain Quantile by replacing the original weights, , with the poststratification weights, . However, the variance of , which is described in the section Domain Quantile, is computed as follows when you specify the VARMETHOD=TAYLOR option (or by default).

Define

StartLayout 1st Row 1st Column ModifyingAbove theta With caret Subscript upper D Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis 2nd Column equals 3rd Column upper Z Subscript r Superscript negative 1 Baseline sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis upper I Subscript r Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline left-parenthesis upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis right-parenthesis 2nd Row 1st Column ModifyingAbove Above upper I overbar With caret Subscript upper D Superscript left-parenthesis r right-parenthesis 2nd Column equals 3rd Column upper Z Subscript r Superscript negative 1 Baseline sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript r Baseline left-parenthesis h comma i comma j right-parenthesis upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline 3rd Row 1st Column ModifyingAbove theta With caret Subscript upper D Baseline left-parenthesis p right-parenthesis 2nd Column equals 3rd Column StartFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline left-parenthesis upper I left-parenthesis y Subscript h i j Baseline less-than-or-equal-to ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis minus ModifyingAbove upper F With caret Subscript upper D Baseline left-parenthesis ModifyingAbove upper Q With caret Subscript upper D Baseline left-parenthesis p right-parenthesis right-parenthesis right-parenthesis Over sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline EndFraction EndLayout

Assume that the observation (h, i, j) belongs to the rth poststratum. Then the variance of is estimated by

Geometric Mean

For a continuous variable Y that has positive values, the SURVEYMEANS procedure can compute its geometric mean and associated standard error and confidence limits. To request these statistics, you can specify statistic-keywords such as GEOMEAN, GMSTDERR, and GMCLM.

The geometric mean of Y from a sample is computed as

where

is the sum of the weights over all observations in the data set.

When you use the Taylor series method, the variance estimation for the geometric mean is computed as

StartLayout 1st Row 1st Column ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis 2nd Column equals 3rd Column left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis squared sigma-summation Underscript h equals 1 Overscript upper H Endscripts StartFraction n Subscript h Baseline left-parenthesis 1 minus f Subscript h Baseline right-parenthesis Over n Subscript h Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts left-parenthesis r Subscript h i dot Baseline minus r overbar Subscript h dot dot Baseline right-parenthesis squared EndLayout

where

StartLayout 1st Row 1st Column r Subscript h i dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline left-parenthesis ln left-parenthesis y Subscript h i j Baseline right-parenthesis minus ln left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis right-parenthesis right-parenthesis slash w Subscript dot dot dot Baseline 2nd Row 1st Column r overbar Subscript h dot dot 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts r Subscript h i dot Baseline right-parenthesis slash n Subscript h Baseline EndLayout

The standard error of the geometric mean is the square root of the estimated variance:

StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis equals StartRoot ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis EndRoot

The confidence limits for the geometric means are computed based on the confidence limits for the log transformation of the Y variable as

left-parenthesis exp left-parenthesis ln left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis minus gamma right-parenthesis comma exp left-parenthesis ln left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis plus gamma right-parenthesis right-parenthesis

where

gamma equals t Subscript d f comma alpha slash 2 Baseline asterisk StdErr left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis slash ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline

and is the th percentile of the t distribution, with df calculated as in the section t Test for the Mean.

If you use replication methods to estimate the variance by specifying the VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, or VARMETHOD=JACKKNIFE option, the procedure computes the variance of a geometric means by using the variability among replicate estimates to estimate the overall variance. See the section Replication Methods for Variance Estimation for more information.

Then the standard error is the square root of the estimated variance:

StdErr Subscript upper R Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis equals StartRoot ModifyingAbove upper V Subscript upper R Baseline With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis EndRoot

The confidence limits for the geometric means are computed based on the confidence limits for the log transformation of the variable Y as

where

lamda equals t Subscript d f comma alpha slash 2 Baseline asterisk StdErr Subscript upper R Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper G Baseline right-parenthesis slash ModifyingAbove Above upper Y overbar With caret Subscript upper G

and is the th percentile of the t distribution, with df calculated as in the section t Test for the Mean.

Poststratification

After a probability sample is drawn and survey data are collected, researchers sometimes want to stratify the sample according to auxiliary information about the sampled population. This process is often called poststratification.

When poststratification is done properly, it can improve efficiency. It can also be used to adjust the sampling weights such that the marginal distribution of the sampling weights is in agreement with known auxiliary information from other resources, such as the census. The adjusted weight is often called the poststratification weight.

It is quite common for researchers to use poststratification techniques in survey data analysis.

Poststratification is also used by epidemiologists, who frequently analyze health survey data. They often compute statistics based on a process called direct standardization, a form of poststratification. For example, certain diseases, such as cancer, are more common among older populations. Therefore, to compare the prevalence rates among geographic regions that are populated with different age groups, it is necessary to make adjustments according to such demographic categories and to compute relative prevalence rates of the diseases.

For more information about poststratification, see Fuller (2009); Lohr (2010); Wolter (2007); Rao, Yung, and Hidiroglou (2002).

After you provide the population controls for each poststratum that is defined by the poststratification variables, the SURVEYMEANS procedure creates the poststratification weights accordingly. Then the procedure computes statistics that you request by using poststratification weights.

You can save the poststratification weights in an OUTPSWGT= data set to be used in subsequent analyses.

For a selected sample, let be the poststratum index; let be the population totals (or poststratum totals) for the corresponding poststrata, and let be a corresponding indicator variable for poststratum p defined by

upper I Subscript p Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to poststratum p 2nd Row 1st Column 0 2nd Column otherwise EndLayout

Denote the total sum of original weights in the sample for each poststratum as

psi Subscript p Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline upper I Subscript p Baseline left-parenthesis h comma i comma j right-parenthesis

Then the poststratification weight for observation (h, i, j) is

w overTilde Subscript h i j Baseline equals w Subscript h i j Baseline StartFraction upper Z Subscript p Baseline Over psi Subscript p Baseline EndFraction

The SURVEYMEANS procedure computes statistics by using the poststratification weights instead of the original weights .

The standard error and confidence intervals of computed statistics are based on the estimated variances, which are computed by using either a replication method or the Taylor series method.

Replication Methods

When you specify the VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, or VARMETHOD=JACKKNIFE option, PROC SURVEYMEANS computes the variance of a statistic by using replication methods, as described in the section Replication Methods for Variance Estimation. However, with poststratification, an extra step is needed to adjust the weights.

First, PROC SURVEYMEANS constructs a replicate and computes appropriate replicate weights for the replicate. Then, by using the poststratification control totals, the procedure adjusts these replicate weights in the same way as described previously for constructing the poststratification weights for the full sample. Finally, PROC SURVEYMEANS computes the estimate for a desired statistics by using the poststratification weights that are adjusted from the replicate weights in the current replicate. Then the final variance is estimated by the variability among replicate estimates, as described in the section Replication Methods for Variance Estimation.

Taylor Series Method

When you specify VARMETHOD=TAYLOR, or by default when you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses the Taylor series method to estimate the variances of requested statistics.

Variance of the Mean and Sum

The sum and mean of variable Y under poststratification are

StartLayout 1st Row 1st Column ModifyingAbove upper Y With caret Superscript left-parenthesis upper P upper S right-parenthesis 2nd Column equals 3rd Column sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w overTilde Subscript h i j Baseline y Subscript h i j Baseline 2nd Row 1st Column ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis upper P upper S right-parenthesis 2nd Column equals 3rd Column ModifyingAbove upper Y With caret Superscript left-parenthesis upper P upper S right-parenthesis Baseline slash w overTilde Subscript dot dot dot Baseline EndLayout

where

w overTilde Subscript dot dot dot Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w overTilde Subscript h i j

is the sum of the poststratification weights over all observations in the sample.

For each poststratum , let the mean of variable Y be

ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis p right-parenthesis Baseline equals left-parenthesis sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript p Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline y Subscript h i j Baseline right-parenthesis slash upper Z Subscript p Baseline

where is the total of the poststratification weights in poststratum p.

For observation (h, i, j), assume that it belongs to the pth poststratum. Let

y overTilde Subscript h i j Baseline equals y Subscript h i j Baseline minus ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis p right-parenthesis

PROC SURVEYMEANS estimates the variance of as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis

where, if , then

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

PROC SURVEYMEANS estimates the variance of as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Y With caret Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis equals ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis w overTilde Subscript dot dot dot Superscript 2

Variance of the Domain Mean and Sum

For a domain D, let be the corresponding indicator variable:

Let

v overTilde Subscript h i j Baseline equals w overTilde Subscript h i j Baseline upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column w overTilde Subscript h i j Baseline 2nd Column if observation left-parenthesis h comma i comma j right-parenthesis belongs to upper D 2nd Row 1st Column 0 2nd Column otherwise EndLayout

The sum and mean of variable Y under poststratification in domain D are

StartLayout 1st Row 1st Column ModifyingAbove upper Y With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis 2nd Column equals 3rd Column sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v overTilde Subscript h i j Baseline y Subscript h i j Baseline 2nd Row 1st Column ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis 2nd Column equals 3rd Column ModifyingAbove upper Y With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis Baseline slash v overTilde Subscript dot dot dot Baseline EndLayout

where

v overTilde Subscript dot dot dot Baseline equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts v overTilde Subscript h i j

is the sum of the poststratification weights over all observations in the sample in domain D. For each poststratum , let the mean of variable Y and the mean of the domain indicator variable in each poststratum be

StartLayout 1st Row 1st Column ModifyingAbove Above upper Y overbar With caret Subscript upper D Superscript left-parenthesis p right-parenthesis 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript p Baseline left-parenthesis h comma i comma j right-parenthesis upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline y Subscript h i j Baseline right-parenthesis slash upper Z Subscript p Baseline 2nd Row 1st Column ModifyingAbove Above upper I overbar With caret Subscript upper D Superscript left-parenthesis p right-parenthesis 2nd Column equals 3rd Column left-parenthesis sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts upper I Subscript p Baseline left-parenthesis h comma i comma j right-parenthesis upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis w overTilde Subscript h i j Baseline right-parenthesis slash upper Z Subscript p Baseline EndLayout

Assume that the observation (h, i, j) belongs to the pth poststratum. Let

StartLayout 1st Row 1st Column d Subscript h i j 2nd Column equals 3rd Column y Subscript h i j Baseline upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis minus ModifyingAbove Above upper Y overbar With caret Subscript upper D Superscript left-parenthesis p right-parenthesis Baseline 2nd Row 1st Column e Subscript h i j 2nd Column equals 3rd Column d Subscript h i j Baseline minus left-parenthesis upper I Subscript upper D Baseline left-parenthesis h comma i comma j right-parenthesis minus ModifyingAbove Above upper I overbar With caret Subscript upper D Superscript left-parenthesis p right-parenthesis Baseline right-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis EndLayout

Then PROC SURVEYMEANS estimates the variance of domain sum as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis

where, if , then

and if , then

ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove upper Y With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column missing 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline equals 1 for h prime equals 1 comma 2 comma ellipsis comma upper H 2nd Row 1st Column 0 2nd Column if n Subscript h Sub Superscript prime Subscript Baseline greater-than 1 for some 1 less-than-or-equal-to h prime less-than-or-equal-to upper H EndLayout

Then PROC SURVEYMEANS estimates the variance of domain mean as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts ModifyingAbove upper V With caret Subscript h Baseline left-parenthesis ModifyingAbove Above upper Y overbar With caret Subscript upper D Baseline Superscript left-parenthesis upper P upper S right-parenthesis Baseline right-parenthesis

where, if , then

and if , then

Variance of the Ratio

Suppose you want to calculate the ratio of variable Y to variable X. Let and be the values of variable X and variable Y, respectively, for observation (h, i, j).

The ratio of Y to X after poststratification is