Operator fusion rule yaml syntax
This doc page is a reference to the operator fusion rule yaml syntax, which is used for automatic detection of modules in an app when compiling in o1 mode. For more info about o1, or and on using o1 with manually defined modules, see Compiler optimization modes.
Each section of this document explains a different element of the syntax. All examples of the syntax are based on our internal version of GPT2’s QKV rule, because with just 3 operators it’s fairly simple.
Operator fusion rule main elements
Each operator fusion rule has the following main parts:
-
A name.
-
A priority - a numerical value that’s determines which rule wins if multiple rules match the same operators. Patterns with higher values of
priority
win over patterns with lower values ofpriority
, that is,priority: 2
wins overpriority: 1
. -
(Optional) The name of a heuristic to apply to the fused pattern.
-
A pattern that describes a set of PyTorch operators that are connected in a pattern.
For example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: relu0
relu0:
op_type: relu
This pattern has three op_
entries: a layer_norm
that’s connected to a linear
which is then connected to a relu
. Each entry in the pattern has its own name (layer_norm
, lin1
, and relu
) and describes a single operator.
To describe the pattern’s edges, we use the child
or children
keyword.
With this rule in place, if the compiler is in o1 mode and finds a pattern of layer_norm ⇒ linear ⇒ relu
within the graph, then this pattern matches.
Operators with more than one child
If an operator has more than one child, you can use the children
keyword and a list of operators, as follows:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
children:
- relu0
- relu1
relu0:
op_type: relu
relu1:
op_type: relu
Optional operators
Some models have different configurations, so that an operator appears only if a certain flag is set. In this case, where a single operator may or may not appear in the pattern, you can describe this by setting required: false
on the operator’s entry in the pattern. For example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
children:
- relu0
- optional_add
optional_add:
op_type: add
required: false
child: relu1
relu0:
op_type: relu
relu1:
op_type: relu
If required: false
is seen on an operator, then the compiler skips over the operator if it isn’t found. That is, the above example pattern matches for either <layer_norm, lin1, relu0, relu1>
or <layer_norm, lin1, relu0, optional_add, relu1>
, but not for <layer_norm, lin1, relu0>
.
required: False
is similar to the ?
operator in regular expressions.
Alternate paths
If a PyTorch module has different configuration options, then the resulting operator pattern might have slight differences as well. Use the match_first_option
keyword to ensure that your pattern matches for each of the different options. For example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: other_activation
other_activation:
match_first_option:
- silu0
- relu0
silu0:
op_type: silu
relu0:
op_type: relu
child: relu1
relu1:
op_type: relu
Above, note that match_first_option
used by the other_activation
entry is used by itself with no corresponding op_type
or children
.
In contrast with required: False
, which skips over entries that cannot be matched, at least one of other_activation’s options must match. In other words, this pattern either matches `<layer_norm, lin1, relu0, relu1>
or <layer_norm, lin1, silu0>
.
match_first_option
is like the |
operator in regular expressions.
Alternate patterns
If a PyTorch module has many different configuration options, but many are bundled together, it may be easier to describe your patterns as full alternates of each other instead of using match_first_option
. For example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern1:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: relu0
relu0:
op_type: relu
pattern2:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: silu0
silu0:
op_type: silu
In this case, the pattern matches either <layer_norm, lin1, relu0>
or <layer_norm, lin1, silu0>
. Use the method that you think is clearest.
Alternate operators / character classes
You can specify alternatives in a pattern by listing multiple operators for a single operator, similar to using character classes in a regular expression. For example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: activation
activation:
op_type:
- relu
- silu
In this case, the pattern matches <layer_norm, lin1, activation>
and the activation operator can be of type relu
or type silu
.
Alternately, if the pattern could match any operator, you can use the special op_type of .
, for example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: activation
activation:
op_type: .
This matches <layer_norm, lin1, activation>
. The activation
can have any op_type.
Character classes can also be inverted like this:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: linear
child: activation
activation:
op_type:
inverted: True
options:
- linear
- addmm
This matches <layer_norm, lin1, activation>
. The activation
can have any op_type but not linear
or addmm
.
Matching multiple operators
To match multiple operators in a sequence, you can use the match_multiple
keyword. This keyboard is most useful when using the .
operator, described above. For example:
gpt2_qkv:
priority: 0
heuristic: GPT2_QKV
pattern:
layer_norm:
op_type: layer_norm
child: lin1
lin1:
op_type: .
match_multiple: true
child: relu0
relu0:
op_type: relu
This pattern matches layer_norm
, and then a sequence of one or more operators (any type but not relu
), and then a relu
.
You can never end a pattern that uses match_multiple with a period (. ) as an op_type .
|
Learn more!
-
See Compiler optimization modes for an introduction to operator fusion.
-
See SambaFlow compiler overview for an introduction to how the compiler works.