Troubleshooting
In this doc page, learn about troubleshooting SambaFlow problems. We’ll add troubleshooting items
Out-of-memory when loading Hugging Face model
If your model encounters an out-of-memory error when loading a Hugging Face model, experiment with using lazy initialization.
Use lazy initialization when initializing a model. Lazy initialization does not randomize or populate weights at the model loading time. Here’s an example code snippet:
if args.config_name:
config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
model = AutoModelForCausalLM.from_config(config)
elif args.model_name_or_path:
model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path, cache_dir=args.cache_dir)
else:
raise RuntimeError("Must provide --model_name_or_path or --config_name")
# Patch the model here
model = patch_model(model, args)
Problems when training a model
Here are some best practices for training a model.
-
Use checkpoints. SambaFlow supports saving a checkpoint at the end of one training run, and then starting the next training run from the checkpoint. See Train using a checkpoint.
-
Consider using model.bfloat. If you see problems while training, consider using
model.bfloat
in the model. If the model definition (FP32) uses scalars, which are interpreted as inf in BF16, problems might result.