-
MDATA=SAS-data-set
-
specifies the input data set that contains parameter values
for the covariance or semivariogram model. The MDATA= data set must contain a variable named FORM, and it can optionally include any of the variables SCALE, RANGE, NUGGET, and SMOOTH.
The FORM variable must be a character variable. It accepts only the AUTO value or the form values that can be specified in the FORM= option in the MODEL statement. The RANGE, SCALE, NUGGET, and SMOOTH variables must be numeric or missing.
The number of observations present in the MDATA= data set corresponds to the level of nesting of the semivariogram model. Each observation line describes a structure of the model you submit for fitting.
If you specify the AUTO value for the FORM variable in an observation, then you cannot specify additional nested structures in the same data set, and any parameters you specify in the same structure are ignored. In that case, PROC VARIOGRAM performs a crude automated search among all available forms to obtain the best fit with up to three nested structures in a model. You can refine this type of search with additional suboptions when you perform it with the FORM=AUTO option instead of the MDATA= option in the MODEL statement.
When you have a nested model, you might want to specify parameter values for only some of the nested structures. In this case, you must specify the corresponding parameter values for the remaining model structures as missing values.
For example, you can use the following DATA step to specify a non-nested model that uses a spherical covariance within an MDATA= data set:
data md1;
input scale range form $;
datalines;
25 10 SPH
;
Then, you can use the md1 data in the MODEL statement of PROC VARIOGRAM as shown in the following statements:
proc variogram data=...;
compute ...;
model mdata=md1;
run;
This is equivalent to the following explicit specification of the semivariance model parameters:
proc variogram data=...;
compute ...;
model form=sph scale=25 range=10;
run;
The following data set md2 is an example of a nested model:
data md2;
input form $ scale range nugget smooth;
datalines;
SPH 20 8 5 .
MAT 12 3 5 0.7
GAU . 1 5 .
;
This specification is equivalent to the following explicit specification of the semivariance model parameters:
proc variogram data=...;
compute ....;
model form=(sph,mat,gau)
scale=(20,12,.) range=(8,3,1) smooth=0.7 nugget=5;
run;
Use the SMOOTH variable column in the MDATA= data set to specify the smoothing parameter
in the Matérn semivariogram models. The SMOOTH variable values must be positive and no greater than 1,000,000. PROC VARIOGRAM sets this upper limit for numerical and performance reasons. In any case, if the fitting process leads the smoothness value to exceed the default threshold value 10,000, then the VARIOGRAM procedure converts the Matérn form into a Gaussian form and repeats the model fitting. To adjust the switching threshold value, you can use the MTOGTOL= option in the MODEL statement.
If you specify a SMOOTH column in the MDATA= data set, then its elements are ignored except for the rows in which the corresponding FORM is Matérn.
The NUGGET variable value is the same for all nested structures. This is the way to specify a nugget effect in the MDATA= data set. If you specify more than one nugget value for different structures, then the last nugget value specified is used.
-
METHOD=method-options
-
must be specified in the MODEL statement to fit a theoretical model to the empirical semivariance. The METHOD option has the following suboptions:
-
OLS
specifies that ordinary least squares be used for the fitting.
-
WLS
specifies that weighted least squares be used for the fitting.
The default is METHOD=WLS.
-
NEPSILON=min-nugget-factor
NEPS=min-nugget-factor
-
specifies that a minimal nugget effect be added to the theoretical
semivariance in the unlikely occasion that the theoretical semivariance becomes zero during fitting with weighted least squares. As explained in the section Theoretical and Computational Details of the Semivariogram, the theoretical semivariance is always positive for any distance larger than zero. If a conflicting situation emerges as a result of numerical fitting issues, then the NEPSILON= option can help you alleviate the problem by adding a minimal variance at the distance lag where the issue is encountered. For more details, see the section Parameter Initialization.
If you omit the NEPSILON= option, then PROC VARIOGRAM sets a default value of
. If a minimal nugget effect is used, its value is case-specific and is based on the min-nugget-factor. Specifically, its value is defined as min-nugget-factor times the sample variance of the input data set, or as min-nugget-factor when the sample variance is equal to zero.
-
NUGGET=number
-
specifies the nugget effect for the model. The nugget effect is due to
a discontinuity in the semivariogram as determined by plotting the sample semivariogram; see Theoretical Semivariogram Models for more details. The NUGGET= parameter is a nonnegative number. If you specify a nonmissing value, then it is used as a fixed parameter in the fitting process.
PROC VARIOGRAM assigns a default initial value for the nugget effect in the following cases:
The NUGGET= option is incompatible with the specification of the PARMS statement for the corresponding MODEL statement.
-
RANGE=range | (range1, …, rangek)
-
specifies the range parameter in semivariogram models.
The RANGE= option is optional. However, if you specify the RANGE= option, then you must provide range values for all structures that you have specified explicitly in the FORM= option. All nonmissing range values are considered as fixed parameters. PROC VARIOGRAM assigns a default initial value to any of the model structures for which you specify a missing range value. PROC VARIOGRAM assigns default initial values to all model structures if you omit the RANGE= option, unless you specify an associated PARMS statement and initial values for the range in it.
The range parameter is a positive number, has the units of distance, and is related to the correlation scale of the underlying spatial process.
Note: If you specify this parameter for a power model, then it does not correspond to a range. For power models, the parameter you specify in the RANGE option is a dimensionless power exponent whose value must range within [0,2) so that the power model is a valid semivariance function.
The RANGE= option is ignored when you specify the FORM=AUTO option. The RANGE= option is incompatible with the specification of the PARMS statement for the corresponding MODEL statement.
-
RANGELAG=rlag-list
RLAG=rlag-list
-
specifies that you prefer to use the range of consecutive nonmissing
empirical semivariance lags in the rlag-list for the semivariogram fitting process, instead of using all MAXLAGS+1 lag classes by default. You can specify rlag-list in either of the following forms:
- k
a single value that designates the width of the selected lag range by starting at lag zero. You must use at least three lags to perform model fitting, so you can specify k within [3,
, MAXLAGS+1].
- m TO n
a sequence in which m equals the starting lag and n equals the ending lag. The parameters m and n must be nonnegative integer numbers to designate lag classes between zero and MAXLAGS. Use at least three lags for model fitting; hence it holds that
.
The following two brief examples exhibit the use of the RANGELAG option. These examples assume that you have set the MAXLAGS= option to 9 or higher to indicate nonmissing empirical semivariance estimates at 10 lags or more.
In the first example,
RANGELAG=8
uses the empirical semivariance in the first eight lags to fit a theoretical model. Hence, RANGELAG=8 uses only the lag classes zero to seven. This approach enables you to account only for the correlation behavior described by the first k empirical semivariogram lag classes.
In the second example,
RANGELAG=2 TO 9
specifies that the empirical semivariance values at lag classes zero, one, and after lag class nine are excluded from the model fitting process.
-
RANKEPS=reps-value
REPS=reps-value
-
specifies the minimum threshold to compare fit quality of two models
for a specific criterion. Beyond this threshold the criterion values become insensitive to comparison. In particular, when you fit multiple models, PROC VARIOGRAM computes for each one the value of the fitting criterion specified in the CHOOSE= option of the MODEL statement. These values are examined in pairs at the sorting stage. If the difference of a given pair exceeds the reps-value, then the sort order of the corresponding models is reversed; otherwise, the two models retain their relative order in the rankings. Hence, the RANKEPS= option can affect model ranking in the fit summary table.
The default value for the RANKEPS= parameter is
and accounts for the default optimization convergence tolerance at the fitting stage prior to model ranking. The convergence tolerance itself limits the accuracy that you can use to compare two models under a given criterion. As a result, smaller values of the RANKEPS= parameter might not lead to a sensible and more strict model comparison because for a smaller reps-value, ranking could depend on digits beyond the accuracy limit.
In the opposite end, if the specified reps-value turns out to be large compared to the criterion value differences, then it can make the sorting process insensitive to the specified sorting criterion. When this happens, the fit summary table ranking reflects only the order in which different models are examined in the procedure flow. You can tell whether the criterion is bypassed; if it is, then one or more values of the specified criterion might not appear to be sorted in the fit summary table.
The RANKEPS= parameter must be a positive number. The RANKEPS= option applies when you fit multiple models with the FORM=AUTO option of the MODEL statement; otherwise, it is ignored.
-
SCALE=scale | (scale1, …, scalek)
-
specifies the scale parameter in semivariogram
models. The SCALE= option is optional. However, if you specify the SCALE= option, then you must provide sill values for all structures that you have specified explicitly in the FORM= option. All nonmissing scale values are considered as fixed parameters. PROC VARIOGRAM assigns a default initial value to any of the model structures for which you specify a missing scale value. PROC VARIOGRAM assigns default initial values to all model structures if you omit the SCALE= option, unless you specify an associated PARMS statement with initial values for scale.
The scale parameter is a positive number. It has the same units as the variance of the variable in the VAR statement. The scale of each structure in a semivariogram model represents the variance contribution of the structure to the total model variance.
In power models the SCALE= parameter does not correspond to a sill because the power model has no sill. Instead, PROC VARIOGRAM uses the SCALE= option to designate the slope (or scaling factor) in power model forms. The power model slope has the same variance units as the variable in the VAR statement.
The SCALE= option is ignored when you specify the FORM=AUTO option. The SCALE= option is incompatible with the specification of the PARMS statement for the corresponding MODEL statement.
-
SMOOTH=smooth | (smooth1, …, smoothm)
-
specifies the positive smoothness parameter
in the Matérn
type of semivariance structures. The special case
is equivalent to the exponential model, whereas the theoretical limit
gives the Gaussian model.
The SMOOTH= option is optional. When you specify an explicit model in the FORM= option with m Matérn structures, you can provide up to m smoothness values. You can specify a value for smoothi,
that is positive and no greater than 1,000,000. PROC VARIOGRAM sets this upper limit for the SMOOTH= option values for numerical and performance reasons. In any case, if the fitting process leads the smoothness value to exceed the default threshold value 10,000, then the VARIOGRAM procedure converts the Matérn form into a Gaussian form and repeats the model fitting. To adjust the switching threshold value, you can use the MTOGTOL= option in the MODEL statement.
If you specify fewer than m values, then the remaining Matérn structures have their smoothness parameters initialized to missing values. If you specify more than m values, then values in excess are ignored.
All nonmissing smoothness values are considered as fixed parameters of the corresponding Matérn structures. PROC VARIOGRAM assigns a default initial value to any of the model Matérn structures, if any, for which you specify a missing smoothness value. PROC VARIOGRAM assigns default initial values to all model Matérn structures if you omit the SMOOTH= option, unless you specify an associated PARMS statement and initial values for smoothness in it.
The SMOOTH= option is ignored when you specify the FORM=AUTO option. The SMOOTH= option is incompatible with the specification of the PARMS statement for the corresponding MODEL statement.