The CAUSALGRAPH Procedure

MODEL Statement

MODEL 'label' path <, path …> < / options >;

where label represents a name that you assign to the model and path represents either of the following specifications:

a directed-path
a covariance-path

Details about the syntax and interpretation of these different types of paths are described later in this section. Here are some examples:

model 'Example1'
   X1 ==> X2,
   X3 <== X2,       /* same as: X2 ==> X3 */
   X3 ==> X4 X5,    /* same as: X3 ==> X4, X3 ==> X5 */
   <==> {X2 X5 X6}; /* latent confounding between X2, X5, and X6 */

model 'Example2'
   X1 ==> X2 ==> X3 <== X4 <==> X5;

You must specify at least one MODEL statement in an analysis.

The label is enclosed within quotation marks and can be any string of characters. Every model that you specify using a MODEL statement must have a unique label. The labels for models are not case-sensitive.

A MODEL statement specifies a causal model in the form of a directed acyclic graph (DAG). A DAG consists of nodes that represent variables in the model and edges that represent causal relationships between pairs of variables. For more information about how to use a DAG to represent a causal model, see the section Causal Graph Theory. You specify the causal relationships in a DAG in accordance with the path syntax in the MODEL statement.

The following subsections explain the path syntax. Each path is either a directed-path or a covariance-path.

Directed-paths

A directed-path has the following form:

variables arrow variables < arrow variables …>

The directed-path continues alternating between arrows and variables and terminates with either a comma (which ends the path) or a semicolon (which ends the MODEL statement). Each variables argument contains a list of variable names. Optionally, you can enclose this list of names in curly braces, square brackets, or parentheses. This means that all the following forms are equivalent:

variables
{variables}
[variables]
(variables)

The use of braces, brackets, or parentheses for grouping variables is optional but highly recommended because it clearly identifies the edges that are associated with each variable.

Each arrow in a directed-path defines a set of edges in a DAG. Each variable in the list preceding the arrow is linked by an edge to each variable in the list following the arrow. The direction of the edge is given by the direction of the arrow. An arrow in a directed-path can be a right arrow (==>), a left arrow (<==), or a bidirected arrow (<==>). For more information about representing arrows in the CAUSALGRAPH procedure, see the section Arrow or Edge Specification.

The procedure does not allow multiple edges of the same type between two variables. If the same edge is specified more than once in a MODEL statement, the repeated specifications are ignored. Variable names are not case-sensitive in the procedure.

Here are some examples of specifying directed-paths:

model 'M1'
   Y ==> Z,
   U <== W,
   X U V ==> Y,
   W ==> Z <== M N ==> Y;

The following MODEL statement specifications of "M2" are equivalent:

model 'M2' V1 V2 ==> X1-X3 A <== B C <==> D;
model 'M2' {V1 V2} ==> {X1-X3 A} <== {B C} <==> D;
model 'M2' V1 V2 ==> X1-X3 A,
           D <==> {B C} ==> X1-X3 A;
model 'M2' V1 ==> X1, V1 ==> X2, V1 ==> X3, V1 ==> A,
   V2 ==> X1, V2 ==> X2, V2 ==> X3, V2 ==> A,
   X1 <== B, X2 <== B, X3 <== B, A <== B,
   X1 <== C, X2 <== C, X3 <== C, A <== C,
   B <==> D, C <==> D;

Covariance-paths

A covariance-path has the following form:

<==> variables

The <==> syntax represents a bidirected arrow. For more information about representing arrows in the CAUSALGRAPH procedure, see the section Arrow or Edge Specification. The variables argument contains a list of variable names. Optionally, you can enclose this list of names in curly braces, square brackets, or parentheses. This means that all the following forms are equivalent:

variables
{variables}
[variables]
(variables)

The use of braces, brackets, or parentheses for grouping variables is optional but highly recommended because it clearly identifies the edges that are associated with each variable.

You use a covariance-path to specify covariances between pairs of variables that are not explained by the causal paths in the model. Essentially, these covariances are equivalent to assuming latent confounding between each pair of variables that you specify in the covariance-path. That is, one bidirected edge is added to the model for each pair of unique variables in the covariance-path. Thus the following specifications are equivalent:

model 'MyModel' <==> {X1 X2};
model 'MyModel' X1 <==> X2;

Furthermore, because a bidirected edge represents latent confounding, this is also equivalent to the following specification:

model 'MyModel' X1 <== L ==> X2;
unmeasured L;

For more information about the interpretation of bidirected edges in the CAUSALGRAPH procedure, see the section Causal Graph Theory.

Here are some examples of specifying covariance-paths:

model 'M3'
   <==> {X1 X2},
   <==> {X3-X5 X8};

The following MODEL statement specifications of "M4" are equivalent:

model 'M4' <==> {X1-X3 Y Z};
model 'M4' <==> {X1 X2 X3 Y Z};
model 'M4'
   <==> {X1 X2},
   <==> {X1 X3},
   <==> {X1 Y},
   <==> {X1 Z},
   <==> {X2 X3},
   <==> {X2 Y},
   <==> {X2 Z},
   <==> {X3 Y},
   <==> {X3 Z},
   <==> {Y Z},
model 'M4'
   X1 <==> X2 X3 Y Z,
   X2 <==> X3 Y Z,
   X3 <==> Y Z,
   Y <==> Z;

Options

You can specify the following options in the MODEL statement:

NOANALYSIS

excludes the model from the identification analysis. A typical use case for this option is when you have two or more models that you want to analyze and those models have very similar structures. You can specify the common structure in a base model and use the NOANALYSIS option to prevent the base model from being analyzed. You can then use two additional MODEL statements with the REFMODEL= option to specify the causal models that you want to analyze.

If you use the OUTMODEL= option in the PROC CAUSALGRAPH statement to save your models, the NOANALYSIS option excludes the current model from that data set.

REFMODEL='label'

specifies a reference model to use as a starting point for the current model. The label is a quoted string that specifies the label of the reference model. The reference model must already exist before you can refer to it in a REFMODEL= option. This means that the reference model must be defined in a prior MODEL statement or must be defined in the data set that you specify in the INMODEL= option. You can specify at most one reference model for each MODEL statement.

When you use the REFMODEL= option, all nodes and edges in the reference model are included in the current model. You can use the path specification in the current MODEL statement to add more nodes and edges to the model, or you can use the REMOVE= or REMOVENODES= option to remove elements from the reference model. For example, the models "M1" through "M4" are identical in the following analysis:

proc causalgraph;
   model 'Base' X -> M -> Y / noanalysis;
   model 'Full' X -> M -> Y, {U C} -> {X Y} / noanalysis;
   model 'M1' X -> M -> Y, C -> X Y;
   model 'M2' C -> X Y / refmodel='Base';
   model 'M3' / refmodel='Full' remove=(U -> X Y);
   model 'M4' / refmodel='Full' removenode=U;
   identify X -> Y;
run;

It is important to understand how PROC CAUSALGRAPH interprets the model specification when you specify the REFMODEL=, REMOVE=, and REMOVENODES= options together. The model specification options in the MODEL statement are processed in the following order:

All nodes and edges in the reference model (specified in the REFMODEL= option) are added to the current model.
Nodes that you specify in the REMOVENODES= option are removed, along with all adjacent edges.
All edges that you specify in the REMOVE= option are removed, along with any nodes that are left unconnected.
Nodes and edges in the MODEL path specification are added.

REMOVE=(path <, path …>)

specifies paths to remove after the reference model has been processed. You must specify the REFMODEL= option in order to use the REMOVE= option.

A path represents either of the following specifications:

a directed path
a covariance path

For more information about these path specifications, see the sections on directed paths and covariance paths in the main MODEL statement documentation. Note that the path specification for the REMOVE= option must be enclosed in parentheses. The REMOVE= option does not change the reference model.

It is important to understand how PROC CAUSALGRAPH interprets the model specification when you specify the REFMODEL=, REMOVE=, and REMOVENODES= options together. For more information, see the REFMODEL= option.

REMOVENODES={variables} REMOVENODES=[variables] REMOVENODES=(variables) REMOVENODE=variable

specifies one or more nodes to remove after PROC CAUSALGRAPH processes the reference model. You must specify the REFMODEL= option in order to use the REMOVENODES= option.

The variables or variable is a list of nodes to remove from the current model. When these nodes are removed, all edges that are connected to these nodes (regardless of direction) are also removed from the model.

Last updated: December 09, 2022