Operator fusion rule yaml syntax

This doc page is a reference to the operator fusion rule yaml syntax, which is used for automatic detection of modules in an app when compiling in o1 mode. For more info about o1, or and on using o1 with manually defined modules, see Compiler optimization modes.

Each section of this document explains a different element of the syntax. All examples of the syntax are based on our internal version of GPT2’s QKV rule, because with just 3 operators it’s fairly simple.

Operator fusion rule main elements

Each operator fusion rule has the following main parts:

A name.
A priority - a numerical value that’s determines which rule wins if multiple rules match the same operators. Patterns with higher values of priority win over patterns with lower values of priority, that is, priority: 2 wins over priority: 1.
(Optional) The name of a heuristic to apply to the fused pattern.
A pattern that describes a set of PyTorch operators that are connected in a pattern.

For example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: relu0
        relu0:
            op_type: relu

This pattern has three op_ entries: a layer_norm that’s connected to a linear which is then connected to a relu. Each entry in the pattern has its own name (layer_norm, lin1, and relu) and describes a single operator.

To describe the pattern’s edges, we use the child or children keyword.

With this rule in place, if the compiler is in o1 mode and finds a pattern of layer_norm ⇒ linear ⇒ relu within the graph, then this pattern matches.

Operators with more than one child

If an operator has more than one child, you can use the children keyword and a list of operators, as follows:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            children:
              - relu0
              - relu1
        relu0:
            op_type: relu
        relu1:
            op_type: relu

Optional operators

Some models have different configurations, so that an operator appears only if a certain flag is set. In this case, where a single operator may or may not appear in the pattern, you can describe this by setting required: false on the operator’s entry in the pattern. For example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            children:
              - relu0
              - optional_add
        optional_add:
            op_type: add
            required: false
            child: relu1
        relu0:
            op_type: relu
        relu1:
            op_type: relu

If required: false is seen on an operator, then the compiler skips over the operator if it isn’t found. That is, the above example pattern matches for either <layer_norm, lin1, relu0, relu1> or <layer_norm, lin1, relu0, optional_add, relu1>, but not for <layer_norm, lin1, relu0>.

required: False is similar to the ? operator in regular expressions.

Alternate paths

If a PyTorch module has different configuration options, then the resulting operator pattern might have slight differences as well. Use the match_first_option keyword to ensure that your pattern matches for each of the different options. For example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: other_activation
        other_activation:
            match_first_option:
              - silu0
              - relu0
        silu0:
            op_type: silu
        relu0:
            op_type: relu
            child: relu1
        relu1:
            op_type: relu

Above, note that match_first_option used by the other_activation entry is used by itself with no corresponding op_type or children.

In contrast with required: False, which skips over entries that cannot be matched, at least one of other_activation’s options must match. In other words, this pattern either matches `<layer_norm, lin1, relu0, relu1> or <layer_norm, lin1, silu0>.

match_first_option is like the | operator in regular expressions.

Alternate patterns

If a PyTorch module has many different configuration options, but many are bundled together, it may be easier to describe your patterns as full alternates of each other instead of using match_first_option. For example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern1:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: relu0
        relu0:
            op_type: relu
    pattern2:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: silu0
        silu0:
            op_type: silu

In this case, the pattern matches either <layer_norm, lin1, relu0> or <layer_norm, lin1, silu0>. Use the method that you think is clearest.

Alternate operators / character classes

You can specify alternatives in a pattern by listing multiple operators for a single operator, similar to using character classes in a regular expression. For example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: activation
        activation:
            op_type:
              - relu
              - silu

In this case, the pattern matches <layer_norm, lin1, activation> and the activation operator can be of type relu or type silu.

Alternately, if the pattern could match any operator, you can use the special op_type of . , for example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: activation
        activation:
            op_type: .

This matches <layer_norm, lin1, activation>. The activation can have any op_type.

Character classes can also be inverted like this:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: linear
            child: activation
        activation:
            op_type:
                inverted: True
                options:
                  - linear
                  - addmm

This matches <layer_norm, lin1, activation>. The activation can have any op_type but not linear or addmm.

Matching multiple operators

To match multiple operators in a sequence, you can use the match_multiple keyword. This keyboard is most useful when using the . operator, described above. For example:

gpt2_qkv:
    priority: 0
    heuristic: GPT2_QKV
    pattern:
        layer_norm:
            op_type: layer_norm
            child: lin1
        lin1:
            op_type: .
            match_multiple: true
            child: relu0
        relu0:
            op_type: relu

This pattern matches layer_norm, and then a sequence of one or more operators (any type but not relu), and then a relu.

You can never end a pattern that uses match_multiple with a period (.) as an op_type.

Learn more!

See Compiler optimization modes for an introduction to operator fusion.
See SambaFlow compiler overview for an introduction to how the compiler works.