Introduction to Survey Sampling and Analysis Procedures

Overview: Survey Sampling and Analysis Procedures

This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.

Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Because of variability among items, researchers apply probability-based scientific designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. For more information about statistical sampling and analysis of complex survey data, see Lohr (2010); Kalton (1983); Cochran (1977); Kish (1965). To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of probability sampling methods. To perform imputation of missing values in survey data, you can use the SURVEYIMPUTE procedure, which provides donor-based imputation methods. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures, which incorporate the sample design into the analyses.

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures properly analyze complex survey data by taking into account the sample design. You can use these procedures for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), bootstrap, and jackknife.

Table 1 briefly describes the SAS/STAT sampling and analysis procedures.

Table 1: Survey Sampling and Analysis Procedures

PROC SURVEYSELECT
Selection methods Simple random sampling (without replacement)
Unrestricted random sampling (with replacement)
Balanced bootstrap
Systematic
Sequential
Bernoulli
Poisson
Probability proportional to size (PPS) sampling,
with and without replacement
PPS systematic
PPS for two units per stratum
PPS sequential with minimum replacement
Allocation methods Proportional
Optimal
Neyman
Sampling tools Stratified sampling
Cluster sampling
Replicated sampling
Serpentine sorting
Random assignment
PROC SURVEYIMPUTE
Imputation methods Single and multiple hot-deck
Approximate Bayesian bootstrap
Fully efficient fractional
Two-stage fully efficient fractional
Fractional hot-deck
PROC SURVEYMEANS
Statistics Means and totals
Proportions
Quantiles
Geometric means
Ratios
Standard errors
Confidence limits
Analyses Hypothesis tests
Domain analysis
Comparison of domain means
Poststratification
Graphics Histograms
Box plots
Summary panel plots
Domain box plots
PROC SURVEYFREQ
Tables One-way frequency tables
Two-way and multiway crosstabulation tables
Estimates of totals and proportions
Standard errors
Confidence limits
Analyses Tests of goodness of fit
Tests of independence
Risks and risk differences
Odds ratios and relative risks
Kappa coefficients
Graphics Weighted frequency and percent plots
Mosaic plots
Odds ratio, relative risk, and risk difference plots
Kappa plots
PROC SURVEYREG
Analyses Linear regression model fitting
Regression coefficients
Covariance matrices
Confidence limits
Hypothesis tests
Estimable functions
Contrasts
Least squares means (LS-means) of effects
Custom hypothesis tests among LS-means
Regression with constructed effects
Predicted values and residuals
Domain analysis
Graphics Fit plots
PROC SURVEYLOGISTIC
Analyses Cumulative logit regression model fitting
Logit, probit, and complementary log-log link functions
Generalized logit regression model fitting
Regression coefficients
Covariance matrices
Confidence limits
Hypothesis tests
Odds ratios
Estimable functions
Contrasts
Least squares means (LS-means) of effects
Custom hypothesis tests among LS-means
Regression with constructed effects
Model diagnostics
Domain analysis
PROC SURVEYPHREG
Analyses Proportional hazards regression model fitting
Breslow and Efron likelihoods
Regression coefficients
Covariance matrices
Confidence limits
Hypothesis tests
Hazard ratios
Contrasts
Predicted values and standard errors
Martingale, Schoenfeld, score, and deviance residuals
Domain analysis


Last updated: March 08, 2022