Troubleshooting

In this doc page, learn about troubleshooting SambaFlow problems. We’ll add troubleshooting items

Out-of-memory when loading Hugging Face model

If your model encounters an out-of-memory error when loading a Hugging Face model, experiment with using lazy initialization.

Use lazy initialization when initializing a model. Lazy initialization does not randomize or populate weights at the model loading time. Here’s an example code snippet:

Download model from Hugging Face
if args.config_name:
    config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
    model = AutoModelForCausalLM.from_config(config)
elif args.model_name_or_path:
    model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path, cache_dir=args.cache_dir)
else:
    raise RuntimeError("Must provide --model_name_or_path or --config_name")

     # Patch the model here
     model = patch_model(model, args)

Problems when training a model

Here are some best practices for training a model.

  • Use checkpoints. SambaFlow supports saving a checkpoint at the end of one training run, and then starting the next training run from the checkpoint. See Train using a checkpoint.

  • Consider using model.bfloat. If you see problems while training, consider using model.bfloat in the model. If the model definition (FP32) uses scalars, which are interpreted as inf in BF16, problems might result.