Model Zoo troubleshooting
How to turn on verbose logging
For additional logging to assist with debugging compilation, add the following flag to the compile command.
Verbose logging is primarily for SambaNova engineers to help them analyze an issue. If you encounter an issue or error, turn on verbose logging, and send the logs to SambaNova customer support. |
+samba_compile.debug=True +samba_compile.verbose=True
"Tried to find SNML socket" error
Problem
You encounter the following non fatal error: `Tried to find SNML socket but couldn’t. Falling back to TCP port.
Cause
Solution
This error indicates that SNML is using TCP (default to port 50053) for communication. To use a Unix socket instead, you can optionally bind the SNML Unix socket /var/snml.sock
.
"Failed to Create Session on RDU: PEF version mismatch"
Problem
You encounter the following error: Failed to Create Session on RDU: PEF version mismatch`
Cause
This error can be caused from specifying the wrong SambaFlow runtime during compile.
Solution
If you see this error, specify the runtime version explicitly in the compile command:
python rdu_generate_text.py\
command=compile\
checkpoint.model_name_or_path=PATH_TO_DOWNLOADED_MODEL\
samba_compile.output_folder=PATH_TO_OUTPUT\
+samba_compile.target_sambaflow_version=MAJOR.MINOR.PATCH
The Devbox contains SambaFlow 1.21.1 and can be used with other Runtime versions such as 1.19.1 and 1.18.7.
-
Use the following command outside the container to check the version of the sambanova-runtime package on the host machine using either rpm or dpkg:
rpm -q sambanova-runtime 2>/dev/null || dpkg -s sambanova-runtime 2>/dev/null) | egrep -m 1 -o "[0-9]+\.[0-9]+\.[0-9]+"
-
If your host machine is on an older version of
sambanova-runtime
than the PEF, add+samba_compile.target_sambaflow_version=MAJOR.MINOR.PATCH
to your compile command.
Match SambaNova Runtime on your SambaNova host machine not the SambaNova Runtime included in the Devbox. |
"ValueError:Configuration is not supported"
Each Model Zoo model includes a whitelist.json
file that specifies the parameter combinations with which we have tested that model. If you’re using a combination that we haven’t tested yet, you’ll see a message that has information like the following at the bottom:
ValueError: Configuration is not supported. Please see /opt/modelzoo/examples/nlp/training/sambanova_modelzoo/models/llama/configs/whitelist.json for supported configurations. Or add `validate_config=False` to proceed with execution, but be aware that it might lead to program failure.
If you want to experiment with an untested configuration, you can run the command again and add +`validate_config=False
at the command line to inform the validator.