Update README.md
Browse files
README.md
CHANGED
@@ -58,23 +58,26 @@ SFT LoRA обучение было выполнено на **двух NVIDIA A10
|
|
58 |
- Fused AdamW
|
59 |
- Liger Kernel (swiglu, fused linear xentropy)
|
60 |
|
61 |
-
**GPU hours**: ~
|
62 |
**GPU mem**:
|
63 |
- Stage 1: 50-55GB of VRAM (both GPUs)
|
64 |
- Stage 2: 79GB of VRAM (both GPUs)
|
65 |
|
66 |
### Training configuration / Конфигурация обучения
|
67 |
-
|
|
|
68 |
--== [MyLLM](https://github.com/Raumberg/myllm) ==--
|
|
|
69 |
**Model training / Обучение модели**
|
|
|
70 |
The model was trained utilizing 2 stages:
|
71 |
- Stage 1:
|
72 |
- Datasets: GrandMaster, LoRA: rank=128, alpha=256
|
73 |
- Stage 2:
|
74 |
- Datasets: Kolmogorov-3, Russian Code, LoRA: rank=256, alpha=256
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
### Using the model / Как запустить?
|
79 |
|
80 |
```python
|
|
|
58 |
- Fused AdamW
|
59 |
- Liger Kernel (swiglu, fused linear xentropy)
|
60 |
|
61 |
+
**GPU hours**: ~384h of NVIDIA A100
|
62 |
**GPU mem**:
|
63 |
- Stage 1: 50-55GB of VRAM (both GPUs)
|
64 |
- Stage 2: 79GB of VRAM (both GPUs)
|
65 |
|
66 |
### Training configuration / Конфигурация обучения
|
67 |
+
The model was trained using MyLLM framework:
|
68 |
+
|
69 |
--== [MyLLM](https://github.com/Raumberg/myllm) ==--
|
70 |
+
|
71 |
**Model training / Обучение модели**
|
72 |
+
|
73 |
The model was trained utilizing 2 stages:
|
74 |
- Stage 1:
|
75 |
- Datasets: GrandMaster, LoRA: rank=128, alpha=256
|
76 |
- Stage 2:
|
77 |
- Datasets: Kolmogorov-3, Russian Code, LoRA: rank=256, alpha=256
|
78 |
+
|
79 |
+
All configs are available in MyLLM repository.
|
80 |
+
|
81 |
### Using the model / Как запустить?
|
82 |
|
83 |
```python
|