The VARIOGRAM Procedure

Autocorrelation Statistics Types

One measure of spatial autocorrelation provided by PROC VARIOGRAM is Moran’s I statistic, which was introduced by Moran (1950) and is defined as

upper I equals StartFraction n Over left-parenthesis n minus 1 right-parenthesis upper S squared upper W EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts w Subscript i j Baseline v Subscript i Baseline v Subscript j

where upper S squared equals left-parenthesis n minus 1 right-parenthesis Superscript negative 1 Baseline sigma-summation Underscript i Endscripts v Subscript i Superscript 2, and upper W equals sigma-summation Underscript i Endscripts sigma-summation Underscript j not-equals i Endscripts w Subscript i j.

Another measure of spatial autocorrelation in PROC VARIOGRAM is Geary’s c statistic (Geary 1954), defined as

c equals StartFraction 1 Over 2 upper S squared upper W EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts w Subscript i j Baseline left-parenthesis z Subscript i Baseline minus z Subscript j Baseline right-parenthesis squared

These expressions indicate that Moran’s I coefficient makes use of the centered variable, whereas the Geary’s c expression uses the noncentered values in the summation.

Inference on these two statistic types comes from approximate tests based on the asymptotic distribution of I and c, which both tend to a normal distribution as n increases. To this end, PROC VARIOGRAM calculates the means and variances of I and c. The outcome depends on the assumption made regarding the distribution upper Z left-parenthesis bold-italic s right-parenthesis. In particular, you can choose to investigate any of the statistics under the normality (also known as Gaussianity) or the randomization assumption. Cliff and Ord (1981) provided the equations for the means and variances of the I and c distributions, as described in the following.

The normality assumption asserts that the random field upper Z left-parenthesis bold-italic s right-parenthesis follows a normal distribution of constant mean (upper Z overbar) and variance, from which the z Subscript i values are drawn. In this case, the I statistics yield

normal upper E Subscript g Baseline left-bracket upper I right-bracket equals minus StartFraction 1 Over n minus 1 EndFraction

and

normal upper E Subscript g Baseline left-bracket upper I squared right-bracket equals StartFraction 1 Over left-parenthesis n plus 1 right-parenthesis left-parenthesis n minus 1 right-parenthesis upper W squared EndFraction left-parenthesis n squared upper S 1 minus n upper S 2 plus 3 upper W squared right-parenthesis

where upper S 1 equals 0.5 sigma-summation Underscript i Endscripts sigma-summation Underscript j not-equals i Endscripts left-parenthesis w Subscript i j Baseline plus w Subscript j i Baseline right-parenthesis squared and upper S 2 equals sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts w Subscript i j Baseline plus sigma-summation Underscript j Endscripts w Subscript j i Baseline right-parenthesis squared. The corresponding moments for the c statistics are

normal upper E Subscript g Baseline left-bracket c right-bracket equals 1

and

normal upper V normal a normal r Subscript italic g Baseline left-bracket c right-bracket equals StartFraction left-parenthesis 2 upper S 1 plus upper S 2 right-parenthesis left-parenthesis n minus 1 right-parenthesis minus 4 upper W squared Over 2 left-parenthesis n plus 1 right-parenthesis upper W squared EndFraction

According to the randomization assumption, the I and c observations are considered in relation to all the different values that I and c could take, respectively, if the n z Subscript i values were repeatedly randomly permuted around the domain D. The moments for the I statistics are now

normal upper E Subscript r Baseline left-bracket upper I right-bracket equals minus StartFraction 1 Over n minus 1 EndFraction

and

normal upper E Subscript r Baseline left-bracket upper I squared right-bracket equals StartFraction upper A 1 plus upper A 2 Over left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis upper W squared EndFraction

where upper A 1 equals n left-bracket left-parenthesis n squared minus 3 n plus 3 right-parenthesis upper S 1 minus n upper S 2 plus 3 upper W squared right-bracket, upper A 2 equals minus b 2 left-bracket n left-parenthesis n minus 1 right-parenthesis upper S 1 minus 2 n upper S 2 plus 6 upper W squared right-bracket. The factor b 2 equals m 4 slash left-parenthesis m 2 squared right-parenthesis is the coefficient of kurtosis that uses the sample moments m Subscript k Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript i Endscripts v Subscript i Superscript k for k equals 2 comma 4. Finally, the c statistics under the randomization assumption are given by

normal upper E Subscript r Baseline left-bracket c right-bracket equals 1

and

normal upper V normal a normal r Subscript italic r Baseline left-bracket c right-bracket equals StartFraction upper B 1 plus upper B 2 plus upper B 3 Over n left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis upper W squared EndFraction

with upper B 1 equals left-parenthesis n minus 1 right-parenthesis upper S 1 left-bracket n squared minus 3 n plus 3 minus left-parenthesis n minus 1 right-parenthesis b 2 right-bracket, upper B 2 equals minus one-fourth left-parenthesis n minus 1 right-parenthesis upper S 2 left-bracket n squared plus 3 n minus 6 minus left-parenthesis n squared minus n plus 2 right-parenthesis b 2 right-bracket, and upper B 3 equals upper W squared left-bracket n squared minus 3 minus b 2 left-parenthesis n minus 1 right-parenthesis squared right-bracket.

If you specify LAGDISTANCE= to be larger than the maximum data distance in your domain, the binary weighting scheme used by the VARIOGRAM procedure leads to all weights w Subscript i j Baseline equals 1, i not-equals j. In this extreme case the preceding definitions can show that the variances of the I and c statistics become zero under either the normality or the randomization assumption.

A similar effect might occur when you have collocated observations (see the section Pair Formation). The Moran’s I and Geary’s c statistics allow for the inclusion of such pairs in the computations. Hence, contrary to the semivariance analysis, PROC VARIOGRAM does not exclude pairs of collocated data from the autocorrelation statistics.

Last updated: December 09, 2022