The DISTANCE Procedure

Proximity Measures

The following notation is used in this section:

v

the number of variables or the dimensionality

x Subscript j

data for observation x and the jth variable, where j equals 1 normal t normal o v

y Subscript j

data for observation y and the jth variable, where j equals 1 normal t normal o v

w Subscript j

weight for the jth variable from the WEIGHTS= option in the VAR statement. The weight w Subscript j Baseline equals 0 when either x Subscript j or y Subscript j is missing.

W

the sum of total weights. Regardless of whether the observation is missing, its weight is added to this metric.

x overbar

mean for observation x

x overbar equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline x Subscript j Baseline slash sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j

y overbar

mean for observation y

y overbar equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline y Subscript j Baseline slash sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j

d left-parenthesis x comma y right-parenthesis

the distance or dissimilarity between observations x and y

s left-parenthesis x comma y right-parenthesis

the similarity between observations x and y

The factor upper W slash sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j is used to adjust some of the proximity measures for missing values.

Methods That Accept All Measurement Levels

GOWER

Gower’s similarity s 1 left-parenthesis x comma y right-parenthesis equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j Baseline d Subscript x comma y Superscript j Baseline slash sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j

You compute delta Subscript x comma y Superscript j as follows:

For nominal, ordinal, interval, or ratio variables,

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 EndLayout

For asymmetric nominal variables,

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f normal e normal i normal t normal h normal e normal r x Subscript j Baseline normal o normal r y Subscript j Baseline normal i normal s normal p normal r normal e normal s normal e normal n normal t 2nd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal i normal f normal b normal o normal t normal h x Subscript j Baseline normal a normal n normal d y Subscript j Baseline normal a normal r normal e normal a normal b normal s normal e normal n normal t EndLayout

For nominal or asymmetric nominal variables,

StartLayout 1st Row 1st Column d Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline equals y Subscript j Baseline 2nd Row 1st Column d Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal i normal f x Subscript j Baseline not-equals y Subscript j EndLayout

For ordinal, interval, or ratio variables,

StartLayout 1st Row 1st Column d Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 minus StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue EndLayout
DGOWER

1 minus Gower d 2 left-parenthesis x comma y right-parenthesis equals 1 minus s 1 left-parenthesis x comma y right-parenthesis

Methods That Accept Ratio, Interval, and Ordinal Variables

Similar methods are presented together in the following list.

EUCLID

Euclidean distance d 3 left-parenthesis x comma y right-parenthesis equals StartRoot left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus y Subscript j Baseline right-parenthesis squared right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis EndRoot

SQEUCLID

Squared Euclidean distance d 4 left-parenthesis x comma y right-parenthesis equals left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus y Subscript j Baseline right-parenthesis squared right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis

SIZE

size distance d 5 left-parenthesis x comma y right-parenthesis equals StartAbsoluteValue sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus y Subscript j Baseline right-parenthesis EndAbsoluteValue StartRoot upper W EndRoot slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis

SHAPE

shape distance d 6 left-parenthesis x comma y right-parenthesis equals StartRoot left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-bracket left-parenthesis x Subscript j Baseline minus x overbar right-parenthesis minus left-parenthesis y Subscript j Baseline minus y overbar right-parenthesis right-bracket squared right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis EndRoot

Note: Squared shape distance plus squared size distance equals squared Euclidean distance.

COV

covariance similarity coefficient s 7 left-parenthesis x comma y right-parenthesis equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus x overbar right-parenthesis left-parenthesis y Subscript j Baseline minus y overbar right-parenthesis slash v a r d i v, where

StartLayout 1st Row 1st Column v a r d i v 2nd Column equals 3rd Column v normal i normal f normal upper V normal upper A normal upper R normal upper D normal upper E normal upper F equals normal upper N 2nd Row 1st Column Blank 2nd Column equals 3rd Column v minus 1 normal i normal f normal upper V normal upper A normal upper R normal upper D normal upper E normal upper F equals normal upper D normal upper F 3rd Row 1st Column Blank 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline normal i normal f normal upper V normal upper A normal upper R normal upper D normal upper E normal upper F equals normal upper W normal upper E normal upper I normal upper G normal upper H normal upper T 4th Row 1st Column Blank 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j minus 1 normal i normal f normal upper V normal upper A normal upper R normal upper D normal upper E normal upper F equals normal upper W normal upper D normal upper F EndLayout
CORR

correlation similarity coefficient s 8 left-parenthesis x comma y right-parenthesis equals StartFraction sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus x overbar right-parenthesis left-parenthesis y Subscript j Baseline minus y overbar right-parenthesis Over StartRoot sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus x overbar right-parenthesis squared sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis y Subscript j Baseline minus y overbar right-parenthesis squared EndRoot EndFraction

DCORR

correlation transformed to Euclidean distance as sqrt(1–CORR) d 9 left-parenthesis x comma y right-parenthesis equals StartRoot 1 minus s 8 left-parenthesis x comma y right-parenthesis EndRoot

SQCORR

squared correlation s 10 left-parenthesis x comma y right-parenthesis equals StartFraction left-bracket sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus x overbar right-parenthesis left-parenthesis y Subscript j Baseline minus y overbar right-parenthesis right-bracket squared Over sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus x overbar right-parenthesis squared sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis y Subscript j Baseline minus y overbar right-parenthesis squared EndFraction

DSQCORR

squared correlation transformed to squared Euclidean distance as (1–SQCORR)

d 11 left-parenthesis x comma y right-parenthesis equals 1 minus s 10 left-parenthesis x comma y right-parenthesis

L(p)

Minkowski (normal upper L Subscript p) distance, where p is a positive numeric value

d 12 left-parenthesis x comma y right-parenthesis equals left-bracket left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Superscript p Baseline right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis right-bracket Superscript 1 slash p

CITYBLOCK

normal upper L 1 d 13 left-parenthesis x comma y right-parenthesis equals left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis

CHEBYCHEV

normal upper L Subscript normal infinity d 14 left-parenthesis x comma y right-parenthesis equals max Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue

POWER(p,r)

generalized Euclidean distance, where p is a nonnegative numeric value and r is a positive numeric value. The distance between two observations is the rth root of sum of the absolute differences to the pth power between the values for the observations:

d 15 left-parenthesis x comma y right-parenthesis equals left-bracket left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Superscript p Baseline right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis right-bracket Superscript 1 slash r

Methods That Accept Ratio Variables

Similar methods are presented together in the following list.

SIMRATIO

similarity ratio s 16 left-parenthesis x comma y right-parenthesis equals StartFraction sigma-summation Underscript j Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline y Subscript j Baseline right-parenthesis Over sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline y Subscript j Baseline right-parenthesis plus sigma-summation Underscript j Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline minus y Subscript j Baseline right-parenthesis squared EndFraction

DISRATIO

one minus similarity ratio d 17 left-parenthesis x comma y right-parenthesis equals 1 minus s 16 left-parenthesis x comma y right-parenthesis

NONMETRIC

Lance-Williams nonmetric coefficient d 18 left-parenthesis x comma y right-parenthesis equals StartFraction sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Over sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline plus y Subscript j Baseline right-parenthesis EndFraction

CANBERRA

Canberra metric distance. See Sneath and Sokal (1973, pp. 125–126) d 19 left-parenthesis x comma y right-parenthesis equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartFraction StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Over left-parenthesis x Subscript j Baseline plus y Subscript j Baseline right-parenthesis EndFraction

Note: When both x Subscript j and y Subscript j are zeros, the corresponding component fraction StartFraction StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Over left-parenthesis x Subscript j Baseline plus y Subscript j Baseline right-parenthesis EndFraction is defined to be zero. Two variants of this distance measure, CANSCALED and CANADKINS, follow.

CANSCALED

Canberra metric distance, scaled d Subscript 19 s Baseline left-parenthesis x comma y right-parenthesis equals StartFraction 1 Over sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline EndFraction sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartFraction StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Over left-parenthesis x Subscript j Baseline plus y Subscript j Baseline right-parenthesis EndFraction

Note: This measure is a scaled version of d 19 left-parenthesis x comma y right-parenthesis so that the resulting range falls between 0 and 1.

CANADKINS

Canberra metric distance, Adkins form (Lance and Williams 1967) d Subscript 19 a Baseline left-parenthesis x comma y right-parenthesis equals StartFraction 1 Over sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis 1 minus z Subscript j Baseline right-parenthesis EndFraction sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline StartFraction StartAbsoluteValue x Subscript j Baseline minus y Subscript j Baseline EndAbsoluteValue Over left-parenthesis x Subscript j Baseline plus y Subscript j Baseline right-parenthesis EndFractionwhere z Subscript j is an indicator variable for zero values in both x Subscript j and y Subscript j.

Note: This measure is similar to the scaled CANBERRA distance, d Subscript 19 s Baseline left-parenthesis x comma y right-parenthesis, but without considering the zero-valued (x Subscript j,y Subscript j) pairs in the scaling.

COSINE

cosine coefficient s 20 left-parenthesis x comma y right-parenthesis equals StartFraction sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline y Subscript j Baseline right-parenthesis Over StartRoot sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline x Subscript j Baseline Superscript 2 Baseline sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline y Subscript j Baseline Superscript 2 Baseline EndRoot EndFraction

DOT

dot (inner) product coefficient s 21 left-parenthesis x comma y right-parenthesis equals left-bracket sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-parenthesis x Subscript j Baseline y Subscript j Baseline right-parenthesis right-bracket slash sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j

OVERLAP

sum of the minimum values s 22 left-parenthesis x comma y right-parenthesis equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline left-bracket min left-parenthesis x Subscript j Baseline comma y Subscript j Baseline right-parenthesis right-bracket

DOVERLAP

maximum of the sum of the x and the sum of y minus overlap d 23 left-parenthesis x comma y right-parenthesis equals max left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline x Subscript j Baseline comma sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline y Subscript j Baseline right-parenthesis minus s 22 left-parenthesis x comma y right-parenthesis

CHISQ

chi-square If the data represent the frequency counts, chi-square dissimilarity between two sets of frequencies can be computed. A 2-by-v contingency table is illustrated to explain how the chi-square dissimilarity is computed as follows:

Variable Row
Observation Var 1 Var 2 Var v Sum
X x 1 x 2 x Subscript v r Subscript x
Y y 1 y 2 y Subscript v r Subscript y
Column Sum c 1 c 2 c Subscript v T

where

StartLayout 1st Row 1st Column r Subscript x 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline x Subscript j 2nd Row 1st Column r Subscript y 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline y Subscript j 3rd Row 1st Column c Subscript j 2nd Column equals 3rd Column w Subscript j Baseline left-parenthesis x Subscript j Baseline plus y Subscript j Baseline right-parenthesis 4th Row 1st Column upper T 2nd Column equals 3rd Column r Subscript x Baseline plus r Subscript y Baseline equals sigma-summation Underscript j equals 1 Overscript v Endscripts c Subscript j EndLayout

The chi-square measure is computed as follows:

d 24 left-parenthesis x comma y right-parenthesis equals left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts StartFraction left-parenthesis w Subscript j Baseline x Subscript j Baseline minus upper E left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis squared Over upper E left-parenthesis x Subscript j Baseline right-parenthesis EndFraction plus sigma-summation Underscript j equals 1 Overscript v Endscripts StartFraction left-parenthesis w Subscript j Baseline y Subscript j Baseline minus upper E left-parenthesis y Subscript j Baseline right-parenthesis right-parenthesis squared Over upper E left-parenthesis y Subscript j Baseline right-parenthesis EndFraction right-parenthesis upper W slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis

where for j= 1, 2, …, v

StartLayout 1st Row 1st Column upper E left-parenthesis x Subscript j Baseline right-parenthesis 2nd Column equals 3rd Column r Subscript x Baseline c Subscript j slash upper T 2nd Row 1st Column upper E left-parenthesis y Subscript j Baseline right-parenthesis 2nd Column equals 3rd Column r Subscript y Baseline c Subscript j slash upper T EndLayout
CHI

square root of chi-square d 25 left-parenthesis x comma y right-parenthesis equals StartRoot d 23 left-parenthesis x comma y right-parenthesis EndRoot

PHISQ

phi-square This is the CHISQ dissimilarity normalized by the sum of weightsd 26 left-parenthesis x comma y right-parenthesis equals d 24 left-parenthesis x comma y right-parenthesis slash left-parenthesis sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline right-parenthesis

PHI

square root of phi-square d 27 left-parenthesis x comma y right-parenthesis equals StartRoot d 25 left-parenthesis x comma y right-parenthesis EndRoot

Methods That Accept Symmetric Nominal Variables

The following notation is used for computing d 28 left-parenthesis x comma y right-parenthesis to s 35 left-parenthesis x comma y right-parenthesis. Notice that only the nonmissing pairs are discussed here; all the pairs with at least one missing value are excluded from any of the computations in the following section because w Subscript j Baseline equals 0 if either x Subscript j or y Subscript j is missing.

M

nonmissing matches

upper M equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j, where

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline equals y Subscript j Baseline 2nd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout
X

nonmissing mismatches

upper X equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j, where

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline not-equals y Subscript j Baseline 2nd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout
N

total nonmissing pairs

upper N equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j

The following list presents distance methods that accept symmetric nominal variables. Similar methods are grouped together.

HAMMING

Hamming distance d 28 left-parenthesis x comma y right-parenthesis equals upper X

MATCH

simple matching coefficient s 29 left-parenthesis x comma y right-parenthesis equals upper M slash upper N

DMATCH

simple matching coefficient transformed to Euclidean distance d 30 left-parenthesis x comma y right-parenthesis equals StartRoot 1 minus upper M slash upper N EndRoot equals StartRoot left-parenthesis upper X slash upper N right-parenthesis EndRoot

DSQMATCH

simple matching coefficient transformed to squared Euclidean distance d 31 left-parenthesis x comma y right-parenthesis equals 1 minus upper M slash upper N equals upper X slash upper N

HAMANN

Hamann coefficient s 32 left-parenthesis x comma y right-parenthesis equals left-parenthesis upper M minus upper X right-parenthesis slash upper N

RT

Roger and Tanimoto s 33 left-parenthesis x comma y right-parenthesis equals upper M slash left-parenthesis upper M plus 2 upper X right-parenthesis

SS1

Sokal and Sneath 1 s 34 left-parenthesis x comma y right-parenthesis equals 2 upper M slash left-parenthesis 2 upper M plus upper X right-parenthesis

SS3

Sokal and Sneath 3. The coefficient between an observation and itself is always indeterminate (missing) because there is no mismatch. s 35 left-parenthesis x comma y right-parenthesis equals upper M slash upper X

Methods That Accept Asymmetric Nominal and Ratio Variables

The following notation is used for computing s 36 left-parenthesis x comma y right-parenthesis to d 41 left-parenthesis x comma y right-parenthesis. Notice that only the nonmissing pairs are discussed in this section; all the pairs that have at least one missing value are excluded from any of the computations here because w Subscript j Baseline equals 0 if either x Subscript j or y Subscript j is missing.

Also, the observed nonmissing data of an asymmetric binary variable can have only two possible outcomes: presence or absence. Therefore, the notation, PX (present mismatches), always has a value of zero for an asymmetric binary variable.

X

mismatches with at least one present

upper X equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j, where

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline not-equals y Subscript j Baseline normal a normal n normal d normal n normal o normal t normal b normal o normal t normal h x Subscript j Baseline normal a normal n normal d y Subscript j Baseline normal a normal r normal e normal a normal b normal s normal e normal n normal t 2nd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout
PM

present matches

upper P upper M equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j, where

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline equals y Subscript j Baseline normal a normal n normal d normal b normal o normal t normal h x Subscript j Baseline normal a normal n normal d y Subscript j Baseline normal a normal r normal e normal p normal r normal e normal s normal e normal n normal t 2nd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout
PX

present mismatches

upper P upper X equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j, where

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline not-equals y Subscript j Baseline normal a normal n normal d normal b normal o normal t normal h x Subscript j Baseline normal a normal n normal d y Subscript j Baseline normal a normal r normal e normal p normal r normal e normal s normal e normal n normal t 2nd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 comma normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout
PP

both present = PM + PX

P

at least one present = PM + X

PAX

present-absent mismatches

upper P upper A upper X equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j Baseline delta Subscript x comma y Superscript j, where

StartLayout 1st Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 1 comma normal i normal f x Subscript j Baseline not-equals y Subscript j Baseline normal a normal n normal d normal e normal i normal t normal h normal e normal r x Subscript j Baseline normal i normal s normal p normal r normal e normal s normal e normal n normal t normal a normal n normal d y Subscript j Baseline normal i normal s normal a normal b normal s normal e normal n normal t normal o normal r 2nd Row 1st Column Blank 2nd Column Blank 3rd Column x Subscript j Baseline normal i normal s normal a normal b normal s normal e normal n normal t normal a normal n normal d y Subscript j Baseline normal i normal s normal p normal r normal e normal s normal e normal n normal t 3rd Row 1st Column delta Subscript x comma y Superscript j 2nd Column equals 3rd Column 0 normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout
N

total nonmissing pairs

upper N equals sigma-summation Underscript j equals 1 Overscript v Endscripts w Subscript j

The following list presents distance methods that accept asymmetric nominal and ratio variables:

JACCARD

Jaccard similarity coefficient

The JACCARD method is equivalent to the SIMRATIO method if there are only ratio variables; if there are both ratio and asymmetric nominal variables, the coefficient is computed as the sum of the coefficient from the ratio variables (SIMRATIO) and the coefficient from the asymmetric nominal variables.

s 36 left-parenthesis x comma y right-parenthesis equals s 16 left-parenthesis x comma y right-parenthesis plus upper P upper M slash upper P

DJACCARD

Jaccard dissimilarity coefficient

The DJACCARD method is equivalent to the DISRATIO method if there are only ratio variables; if there are both ratio and asymmetric nominal variables, the coefficient is computed as the sum of the coefficient from the ratio variables (DISRATIO) and the coefficient from the asymmetric nominal variables.

d 37 left-parenthesis x comma y right-parenthesis equals d 17 left-parenthesis x comma y right-parenthesis plus upper X slash upper P

Methods That Accept Asymmetric Nominal Variables

With reference to the notation defined in the previous section, the following list presents distance methods that accept asymmetric nominal variables:

DICE

Dice coefficient or Czekanowski/Sorensen similarity coefficient s 38 left-parenthesis x comma y right-parenthesis equals 2 upper P upper M slash left-parenthesis upper P plus upper P upper M right-parenthesis

RR

Russell and Rao. This is the binary equivalent of the dot product coefficient. s 39 left-parenthesis x comma y right-parenthesis equals upper P upper M slash upper N

BLWNM | BRAYCURTIS

Binary Lance and Williams, also known as Bray and Curtis coefficient

d 40 left-parenthesis x comma y right-parenthesis equals upper X slash left-parenthesis upper P upper A upper X plus 2 upper P upper P right-parenthesis

K1

Kulcynski 1. The coefficient between an observation and itself is always indeterminate (missing) because there is no mismatch. d 41 left-parenthesis x comma y right-parenthesis equals upper P upper M slash upper X

Last updated: December 09, 2022