File size: 1,955 Bytes
b343b24 1eb5445 b343b24 1eb5445 b343b24 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
base_model:
- mistralai/Mistral-Nemo-Instruct-2407
language:
- ku
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
datasets:
- nazimali/kurdish-wikipedia-articles
library_name: transformers
---
Continued pre-training on `mistralai/Mistral-Nemo-Instruct-2407` using the Kurdish wiki dataset with `unsloth`.
This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding.
It's a quantized model using `bitsandbytes` so that it uses less memory. See [bitsandbytes documentation](https://huggingface.co/docs/transformers/main/en/quantization/bitsandbytes#bitsandbytes).
There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find).
Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish.
Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani.
### Use
Should be fine-tuned further for a specific task. See instruction fine-tuned model [nazimali/Mistral-Nemo-Kurdish-Instruct](https://huggingface.co/nazimali/Mistral-Nemo-Kurdish-Instruct).
### Training
Transformers `4.44.2`
1 NVIDIA A100 80GB PCIe
Duration 6h 31m 4s
```json
{
"total_flos": 4121524790259794000,
"train/epoch": 1,
"train/global_step": 1960,
"train/grad_norm": 3.1958093643188477,
"train/learning_rate": 0,
"train/loss": 1.2108,
"train_loss": 1.256846008738693,
"train_runtime": 23227.1752,
"train_samples_per_second": 2.7,
"train_steps_per_second": 0.084
}
```
#### Pre-training data:
- `nazimali/kurdish-wikipedia-articles`
- Dataset number of rows: 63,076
- Filtered columns `title, text`
- Must have at least 1 character
- Number of rows used for training: 62,720
#### Training prompt format:
```python
training_prompt = """Gotara Wikipedia
### Sernav: {}
### Gotar:
{}""" |