The FMM Procedure

RANDSTART Statement

RANDSTART <randstart-options>;

The RANDSTART statement enables you use randomly generated sets of starting values for mixture model analysis with the maximum likelihood method. This statement is not available for a Bayesian analysis that is performed by the FMM procedure, and it is not available for model selection.

If you specify the RANDSTART statement, PROC FMM determines the maximum likelihood estimates by using a two-stage process. In the first stage, the procedure generates n1 sets of starting values. To generate the sets of starting values, PROC FMM uses its typical starting values. These values are determined either by the procedure’s default method of finding starting values or from the values that you specify in the PARAMETERS option in the MODEL and PROBMODEL statements. This set of values is used as the first starting set. The remaining starting sets are generated around this first set, with random variation controlled by a scale value s. You can control this variation by using the STD= option.

The procedure then uses each set of starting values to initialize a separate optimization. Each optimization is terminated according to an initial convergence criterion r1. The goal of the first stage is to identify the n2 ( n1) sets of starting values that have the largest maximum likelihood values after this optimization. In the second stage, the optimization for these n2 sets is continued according to a stricter convergence criterion, which you specify by using the convergence criteria in the PROC FMM statement. For more information about this process, see the section Random Starting Values.

To fine-tune the two-stage random starting values process, you can specify the following randstart-options:

ALLITER

includes the iteration history for the first stage of estimation in the "Iteration History" table. By default, only the iteration history for the second stage is displayed.

MAXFIRST=n

specifies the maximum number of starting value sets to generate in the first stage.

This option is effective only if the number of starting sets that converge in the first stage is less than the number that you specify in the NSECOND= option. The procedure generates more starting sets than the number that you specify in the NFIRST= option if the number of starting value sets that converge in the first stage is less than the value of the NSECOND= option.

The procedure generates no more than n sets, where n must be greater than or equal to the values that you specify in the NFIRST= and NSECOND= options.

The default value of the MAXFIRST= option depends on the values of the NFIRST= and NSECOND= options. If you specify NFIRST=n1, the default is 2 n1. If you specify NSECOND=n2 but do not specify a value for the NFIRST= option, the default is 20 n2. If you do not specify either the NFIRST= or NSECOND= option, then by default MAXFIRST=200.

NFIRST=n1

specifies the number of starting value sets to generate for the first stage of analysis. The procedure generates at least n1 starting sets to obtain n2 starting sets that converge according to the criterion that you specify in the RANDGCONV= option.

When you specify this option, the procedure generates no fewer than n1 starting sets, where n1 must be greater than or equal to the value of the NSECOND= option and less than or equal to the value of the MAXFIRST= option.

The default value depends on the values of the NSECOND= and MAXFIRST= options. If you specify MAXFIRST=n, the default value is n/2. If you specify NSECOND=n2 but do not specify the MAXFIRST= option, the default value is 10 n2. If you do not specify either the NSECOND= or MAXFIRST= option, then by default NFIRST=100.

NSECOND=n2

specifies the number of starting sets to carry forward into the second stage of optimization. These starting sets achieve convergence in the first stage by meeting the convergence criterion that you specify in the RANDGCONV= option. In the second stage, these starting sets continue to be optimized under a stricter convergence criterion that you specify in the PROC FMM statement. At the end of this second stage, the starting set that has the best log likelihood is the set of maximum likelihood estimates that PROC FMM uses to produce all remaining tables and output.

The value of n2 must be less than or equal to the values that you specify in the NFIRST= and MAXFIRST= options.

The default value depends on the values of the NFIRST= and MAXFIRST= options. If you specify NFIRST=n1, the default is n1/10. If you do not specify a value for the NFIRST= option but you do specify MAXFIRST=n, the default is n/20. If you do not specify either the NFIRST= or MAXFIRST= option, then by default NSECOND=10.

RANDGCONV=r1

specifies the convergence criterion to use in the first stage of optimization. This option is defined similarly to the GCONV= option in the PROC FMM statement, but it is used only in the first stage of the two-stage random starting method. The value of r1 must be greater than zero and greater than the value of the GCONV= option.

Each starting set in the first stage undergoes iterative optimization until it converges according to the criterion r1. If you do not specify the GCONV= option in the PROC FMM statement, then by default RANDGCONV=1E–2. If you specify GCONV=r and the value of r is between 0 and 1, the default RANDGCONV= value is the square root of r. If you specify GCONV=r and the value of r is greater than or equal to 1, then by default RANDGCONV=2r.

STD=s

specifies the standard deviation for the random draw process that generates the starting sets. The value of s must be greater than or equal to 1E–1.

The random drawing process centers on the default starting values or the values that you specify in the PARAMETERS option of the MODEL or PROBMODEL statement.

In the random drawing, scale parameters are drawn from a lognormal distribution whose standard deviation is equal to s. Mean parameters are drawn from a normal distribution whose standard deviation is equal to s.

By default, STD=2.

Last updated: December 09, 2022