Introduction to Survey Sampling and Analysis Procedures

Overview: Survey Sampling and Analysis Procedures

This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.

Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Because of variability among items, researchers apply probability-based scientific designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. For more information about statistical sampling and analysis of complex survey data, see Lohr (2010); Kalton (1983); Cochran (1977); Kish (1965). To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of probability sampling methods. To perform imputation of missing values in survey data, you can use the SURVEYIMPUTE procedure, which provides donor-based imputation methods. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures, which incorporate the sample design into the analyses.

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures properly analyze complex survey data by taking into account the sample design. You can use these procedures for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), bootstrap, and jackknife.

Table 1 briefly describes the SAS/STAT sampling and analysis procedures.

Table 1: Survey Sampling and Analysis Procedures

PROC SURVEYSELECT
Selection methods	Simple random sampling (without replacement)
	Unrestricted random sampling (with replacement)
	Balanced bootstrap
	Systematic
	Sequential
	Bernoulli
	Poisson
	Probability proportional to size (PPS) sampling,
	with and without replacement
	PPS systematic
	PPS for two units per stratum
	PPS sequential with minimum replacement
Allocation methods	Proportional
	Optimal
	Neyman
Sampling tools	Stratified sampling
	Cluster sampling
	Replicated sampling
	Serpentine sorting
	Random assignment
PROC SURVEYIMPUTE
Imputation methods	Single and multiple hot-deck
	Approximate Bayesian bootstrap
	Fully efficient fractional
	Two-stage fully efficient fractional
	Fractional hot-deck
PROC SURVEYMEANS
Statistics	Means and totals
	Proportions
	Quantiles
	Geometric means
	Ratios
	Standard errors
	Confidence limits
Analyses	Hypothesis tests
	Domain analysis
	Comparison of domain means
	Poststratification
Graphics	Histograms
	Box plots
	Summary panel plots
	Domain box plots
PROC SURVEYFREQ
Tables	One-way frequency tables
	Two-way and multiway crosstabulation tables
	Estimates of totals and proportions
	Standard errors
	Confidence limits
Analyses	Tests of goodness of fit
	Tests of independence
	Risks and risk differences
	Odds ratios and relative risks
	Kappa coefficients
Graphics	Weighted frequency and percent plots
	Mosaic plots
	Odds ratio, relative risk, and risk difference plots
	Kappa plots
PROC SURVEYREG
Analyses	Linear regression model fitting
	Regression coefficients
	Covariance matrices
	Confidence limits
	Hypothesis tests
	Estimable functions
	Contrasts
	Least squares means (LS-means) of effects
	Custom hypothesis tests among LS-means
	Regression with constructed effects
	Predicted values and residuals
	Domain analysis
Graphics	Fit plots
PROC SURVEYLOGISTIC
Analyses	Cumulative logit regression model fitting
	Logit, probit, and complementary log-log link functions
	Generalized logit regression model fitting
	Regression coefficients
	Covariance matrices
	Confidence limits
	Hypothesis tests
	Odds ratios
	Estimable functions
	Contrasts
	Least squares means (LS-means) of effects
	Custom hypothesis tests among LS-means
	Regression with constructed effects
	Model diagnostics
	Domain analysis
PROC SURVEYPHREG
Analyses	Proportional hazards regression model fitting
	Breslow and Efron likelihoods
	Regression coefficients
	Covariance matrices
	Confidence limits
	Hypothesis tests
	Hazard ratios
	Contrasts
	Predicted values and standard errors
	Martingale, Schoenfeld, score, and deviance residuals
	Domain analysis

Last updated: March 08, 2022