The DISTANCE Procedure

WEIGHT Statement

  • WGT bar WEIGHT variable ;

The WEIGHT statement specifies a numeric variable in the input data set with values that are used to weight each observation. This weight variable is used for standardizing variables rather than computing the distances. Only one variable can be specified.

The WEIGHT variable values can be nonintegers. An observation is used in the analysis only if the value of the WEIGHT variable is greater than zero. The WEIGHT variable applies to variables that are standardized by the following options: STD=MEAN, STD=SUM, STD=EUCLEN, STD=USTD, STD=STD, STD=AGK, or STD=L.

PROC DISTANCE uses the value of the WEIGHT variable w Subscript i to compute the sample mean, uncorrected sample variances, and sample variances as follows:

x overbar Subscript w Baseline equals sigma-summation Underscript i Endscripts w Subscript i Baseline x Subscript i Baseline slash sigma-summation Underscript i Endscripts w Subscript i
u Subscript w Superscript 2 Baseline equals sigma-summation Underscript i Endscripts w Subscript i Baseline x Subscript i Superscript 2 Baseline slash d
s Subscript w Superscript 2 Baseline equals sigma-summation Underscript i Endscripts w Subscript i Baseline left-parenthesis x Subscript i Baseline minus x overbar Subscript w Baseline right-parenthesis squared slash d

w Subscript i is the weight value of the ith observation, x Subscript i is the value of the ith observation, and d is the divisor controlled by the VARDEF= option (see the VARDEF= option in the PROC DISTANCE statement for details).

PROC DISTANCE uses the value of the WEIGHT variable to calculate the following statistics for standardization:

MEAN

the weighted mean, x overbar Subscript w

SUM

the weighted sum, sigma-summation Underscript i Endscripts w Subscript i Baseline x Subscript i

USTD

the weighted uncorrected standard deviation, StartRoot u Subscript w Superscript 2 Baseline EndRoot

STD

the weighted standard deviation, StartRoot s Subscript w Superscript 2 Baseline EndRoot

EUCLEN

the weighted Euclidean length, computed as the square root of the weighted uncorrected sum of squares:

StartRoot sigma-summation Underscript i Endscripts w Subscript i Baseline x Subscript i Superscript 2 Baseline EndRoot
AGK

the AGK estimate. This estimate is documented further in the ACECLUS procedure as the METHOD=COUNT option. See the discussion of the WEIGHT statement in ChapterĀ 27, The ACECLUS Procedure, for information about how the WEIGHT variable is applied to the AGK estimate.

L

the upper L Subscript p estimate. This estimate is documented further in the FASTCLUS procedure as the LEAST= option. See the discussion of the WEIGHT statement in ChapterĀ 45, The FASTCLUS Procedure, for information about how the WEIGHT variable is used to compute weighted cluster means. Note that the number of clusters is always 1.

Last updated: December 09, 2022