attn-signs
/

Watari-32b-v2

Text Generation

text-generation-inference

Model card Files Files and versions Community

attn-signs commited on 29 days ago

Commit

5ba1e49

·

verified ·

1 Parent(s): 393f594

Update README.md

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -58,23 +58,26 @@ SFT LoRA обучение было выполнено на **двух NVIDIA A10
 - Fused AdamW
 - Liger Kernel (swiglu, fused linear xentropy)
-**GPU hours**: ~384 of NVIDIA A100
 **GPU mem**:
 - Stage 1: 50-55GB of VRAM (both GPUs)
 - Stage 2: 79GB of VRAM (both GPUs)
 ### Training configuration / Конфигурация обучения
-**The model was trained using MyLLM framework:**
 --== [MyLLM](https://github.com/Raumberg/myllm) ==--
 **Model training / Обучение модели**
 The model was trained utilizing 2 stages:
 - Stage 1:
   - Datasets: GrandMaster, LoRA: rank=128, alpha=256
 - Stage 2:
   - Datasets: Kolmogorov-3, Russian Code, LoRA: rank=256, alpha=256
-**All configs are available in MyLLM repository.**
 ### Using the model / Как запустить?
 ```python

 - Fused AdamW
 - Liger Kernel (swiglu, fused linear xentropy)
+**GPU hours**: ~384h of NVIDIA A100
 **GPU mem**:
 - Stage 1: 50-55GB of VRAM (both GPUs)
 - Stage 2: 79GB of VRAM (both GPUs)
 ### Training configuration / Конфигурация обучения
+The model was trained using MyLLM framework:
 --== [MyLLM](https://github.com/Raumberg/myllm) ==--
 **Model training / Обучение модели**
 The model was trained utilizing 2 stages:
 - Stage 1:
   - Datasets: GrandMaster, LoRA: rank=128, alpha=256
 - Stage 2:
   - Datasets: Kolmogorov-3, Russian Code, LoRA: rank=256, alpha=256
+All configs are available in MyLLM repository.
 ### Using the model / Как запустить?
 ```python