--- library_name: transformers license: apache-2.0 base_model: mistralai/Mistral-Nemo-Base-2407 tags: - axolotl - generated_from_trainer datasets: - AiAF/Pretraining-SCPWiki-032025-7B-Instruct-pretraining.jsonl model-index: - name: Pretraining-SCPWiki-032025-12B-Instruct results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.8.0.dev0` ```yaml base_model: mistralai/Mistral-Nemo-Base-2407 # optionally might have model_type or tokenizer_type model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer # Automatically upload checkpoint and final model to HF hub_model_id: AiAF/Pretraining-SCPWiki-032025-12B-Instruct load_in_8bit: false load_in_4bit: false strict: false datasets: - path: AiAF/Pretraining-SCPWiki-032025-7B-Instruct-pretraining.jsonl type: completion dataset_prepared_path: last_run_prepared val_set_size: 0.1 output_dir: ./outputs/out/Pretraining-SCPWiki-032025-12B-V1 sequence_len: 8192 sample_packing: true pad_to_sequence_len: true eval_sample_packing: false wandb_project: "LLM-Pretraining" wandb_entity: wandb_watch: "all" wandb_name: "Pretraining-SCPWiki-032025-12B-V1" wandb_log_model: "false" gradient_accumulation_steps: 4 micro_batch_size: 2 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.000005 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true loss_watchdog_threshold: 5.0 loss_watchdog_patience: 3 save_total_limit: 30 warmup_steps: 10 evals_per_epoch: 20 eval_table_size: eval_max_new_tokens: 128 saves_per_epoch: 20 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: eos_token: "<|im_end|>" pad_token: "" bos_token: "" unk_token: "" tokens: - "<|im_start|>" ```

# Pretraining-SCPWiki-032025-12B-Instruct This model is a fine-tuned version of [mistralai/Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) on the AiAF/Pretraining-SCPWiki-032025-7B-Instruct-pretraining.jsonl dataset. It achieves the following results on the evaluation set: - Loss: 1.5467 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 1.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 3.1576 | 0.0018 | 1 | 3.5143 | | 1.4459 | 0.0511 | 29 | 1.6213 | | 1.4502 | 0.1022 | 58 | 1.6003 | | 1.5545 | 0.1534 | 87 | 1.5870 | | 1.3624 | 0.2045 | 116 | 1.5779 | | 1.3053 | 0.2556 | 145 | 1.5691 | | 1.5688 | 0.3067 | 174 | 1.5635 | | 1.7144 | 0.3579 | 203 | 1.5594 | | 1.5199 | 0.4090 | 232 | 1.5550 | | 1.2483 | 0.4601 | 261 | 1.5516 | | 1.4053 | 0.5112 | 290 | 1.5493 | | 1.4238 | 0.5624 | 319 | 1.5486 | | 1.4939 | 0.6135 | 348 | 1.5477 | | 1.4072 | 0.6646 | 377 | 1.5472 | | 1.6039 | 0.7157 | 406 | 1.5469 | | 1.3127 | 0.7669 | 435 | 1.5468 | | 1.4754 | 0.8180 | 464 | 1.5466 | | 1.5992 | 0.8691 | 493 | 1.5467 | | 1.421 | 0.9202 | 522 | 1.5467 | | 1.2666 | 0.9714 | 551 | 1.5467 | ### Framework versions - Transformers 4.49.0 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.21.0