Model Zoo troubleshooting

How to turn on verbose logging

For additional logging to assist with debugging compilation, add the following flag to the compile command.

Verbose logging is primarily for SambaNova engineers to help them analyze an issue. If you encounter an issue or error, turn on verbose logging, and send the logs to SambaNova customer support.
+samba_compile.debug=True +samba_compile.verbose=True

"Tried to find SNML socket" error

Problem

You encounter the following non fatal error: `Tried to find SNML socket but couldn’t. Falling back to TCP port.

Cause

Solution

This error indicates that SNML is using TCP (default to port 50053) for communication. To use a Unix socket instead, you can optionally bind the SNML Unix socket /var/snml.sock.

"Failed to Create Session on RDU: PEF version mismatch"

Problem

You encounter the following error: Failed to Create Session on RDU: PEF version mismatch`

Cause

This error can be caused from specifying the wrong SambaFlow runtime during compile.

Solution

If you see this error, specify the runtime version explicitly in the compile command:

python rdu_generate_text.py\
  command=compile\
  checkpoint.model_name_or_path=PATH_TO_DOWNLOADED_MODEL\
  samba_compile.output_folder=PATH_TO_OUTPUT\
  +samba_compile.target_sambaflow_version=MAJOR.MINOR.PATCH

The Devbox contains SambaFlow 1.21.1 and can be used with other Runtime versions such as 1.19.1 and 1.18.7.

  1. Use the following command outside the container to check the version of the sambanova-runtime package on the host machine using either rpm or dpkg:

    rpm -q sambanova-runtime 2>/dev/null || dpkg -s sambanova-runtime 2>/dev/null) | egrep -m 1 -o "[0-9]+\.[0-9]+\.[0-9]+"
  2. If your host machine is on an older version of sambanova-runtime than the PEF, add +samba_compile.target_sambaflow_version=MAJOR.MINOR.PATCH to your compile command.

Match SambaNova Runtime on your SambaNova host machine not the SambaNova Runtime included in the Devbox.

"ValueError:Configuration is not supported"

Each Model Zoo model includes a whitelist.json file that specifies the parameter combinations with which we have tested that model. If you’re using a combination that we haven’t tested yet, you’ll see a message that has information like the following at the bottom:

ValueError: Configuration is not supported. Please see /opt/modelzoo/examples/nlp/training/sambanova_modelzoo/models/llama/configs/whitelist.json for supported configurations. Or add `validate_config=False` to proceed with execution, but be aware that it might lead to program failure.

If you want to experiment with an untested configuration, you can run the command again and add +`validate_config=False at the command line to inform the validator.