βš™οΈ Nano-Mistral

Modeling code for Mistral to use with Nanotron

Also contains converted pretrained weights for Mistral-7B-0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1

πŸš€ Quickstart

# Generate a config file
python config_tiny_mistral.py

# Run training
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml

πŸš€ Run generation with pretrained Mistral-7B-0.1

export CUDA_DEVICE_MAX_CONNECTIONS=1
torchrun --nproc_per_node=1 run_generate.py --ckpt-path ./pretrained/Mistral-7B-v0.1

πŸš€ Use your custom model

  • Update the MistralConfig class in config_tiny_mistral.py to match your model's configuration
  • Update the MistralForTraining class in modeling_mistral.py to match your model's architecture
  • Pass the previous to the DistributedTrainer class in run_train.py:
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
  • Run training as usual
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.