Text Generation
Transformers
Safetensors
Russian
qwen2
conversational
text-generation-inference
attn-signs commited on
Commit
5ba1e49
·
verified ·
1 Parent(s): 393f594

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -58,23 +58,26 @@ SFT LoRA обучение было выполнено на **двух NVIDIA A10
58
  - Fused AdamW
59
  - Liger Kernel (swiglu, fused linear xentropy)
60
 
61
- **GPU hours**: ~384 of NVIDIA A100
62
  **GPU mem**:
63
  - Stage 1: 50-55GB of VRAM (both GPUs)
64
  - Stage 2: 79GB of VRAM (both GPUs)
65
 
66
  ### Training configuration / Конфигурация обучения
67
- **The model was trained using MyLLM framework:**
 
68
  --== [MyLLM](https://github.com/Raumberg/myllm) ==--
 
69
  **Model training / Обучение модели**
 
70
  The model was trained utilizing 2 stages:
71
  - Stage 1:
72
  - Datasets: GrandMaster, LoRA: rank=128, alpha=256
73
  - Stage 2:
74
  - Datasets: Kolmogorov-3, Russian Code, LoRA: rank=256, alpha=256
75
-
76
- **All configs are available in MyLLM repository.**
77
-
78
  ### Using the model / Как запустить?
79
 
80
  ```python
 
58
  - Fused AdamW
59
  - Liger Kernel (swiglu, fused linear xentropy)
60
 
61
+ **GPU hours**: ~384h of NVIDIA A100
62
  **GPU mem**:
63
  - Stage 1: 50-55GB of VRAM (both GPUs)
64
  - Stage 2: 79GB of VRAM (both GPUs)
65
 
66
  ### Training configuration / Конфигурация обучения
67
+ The model was trained using MyLLM framework:
68
+
69
  --== [MyLLM](https://github.com/Raumberg/myllm) ==--
70
+
71
  **Model training / Обучение модели**
72
+
73
  The model was trained utilizing 2 stages:
74
  - Stage 1:
75
  - Datasets: GrandMaster, LoRA: rank=128, alpha=256
76
  - Stage 2:
77
  - Datasets: Kolmogorov-3, Russian Code, LoRA: rank=256, alpha=256
78
+
79
+ All configs are available in MyLLM repository.
80
+
81
  ### Using the model / Как запустить?
82
 
83
  ```python