|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- KathirKs/fineweb-edu-hindi |
|
language: |
|
- en |
|
- hi |
|
base_model: |
|
- google/gemma-2-2b |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
# Gemma-2b-hindi: |
|
|
|
Gemma-2b-hindi is a transformer model that is continual pretrained using 300 Billion tokens in Hindi using the Gemma-2-2b as the base model. |
|
|
|
# Hyperparameters: |
|
learning_rate: 2E-4 # set low for fine-tuning <br> |
|
weight_decay: 0.1 <br> |
|
min_lr_ratio: 0.00225 <br> |
|
warmup: 0.01 <br> |
|
decay: 0.99 <br> |
|
rewarmup: 0.01 <br> |
|
|
|
# Code: |
|
The [levanter](https://github.com/stanford-crfm/levanter) repository is used to train the model on google cloud tpus. |
|
<br> Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC) . |
|
|
|
# Contact: |
|
|
|
If you have any queries or issues, reach out to: [Kathir](mailto:[email protected]) |