---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct-1M
tags:
- sanskrit
- translation
- qwen
- axolotl
datasets:
- diabolic6045/Sanskrit-llama
model-index:
- name: Sanskrit-qwen-7B-Translate
  results: []
---

# Sanskrit-qwen-7B-Translate

This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M) optimized for Sanskrit language tasks.

## Model Description

This is a merged version of a fine-tuned Qwen 2.5 7B model, specifically trained for Sanskrit language understanding and translation tasks. The model has been trained on a custom Sanskrit dataset to enhance its capabilities in handling Sanskrit text.

## Intended Uses & Limitations

### Intended Uses
- Sanskrit text understanding and generation
- Sanskrit-English translation tasks
- Sanskrit language processing

### Limitations
- Performance may vary based on the complexity of Sanskrit text
- Model should be used within ethical and legal guidelines

## Training Data

The model was trained on the [diabolic6045/Sanskrit-llama](https://huggingface.co/datasets/diabolic6045/Sanskrit-llama) dataset.

## Training Procedure

### Training Details
- Base Model: Qwen/Qwen2.5-7B-Instruct-1M
- Training Type: Fine-tuning
- Hardware: Multi-GPU setup
- Training Parameters:
  - Learning Rate: 2e-05
  - Epochs: 1
  - Batch Size: 2 (total)
  - Optimizer: AdamW
  - LR Scheduler: Cosine with warmup

## Framework Versions

- Transformers 4.49.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.8.0.dev0`
```yaml

base_model: Qwen/Qwen2.5-7B-Instruct-1M
load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: diabolic6045/Sanskrit-llama
    type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

hub_model_id: Sanskrit-qwen-8B

wandb_project: संस्कृतम्-llama
wandb_entity: 
wandb_watch: all
wandb_name: संस्कृतम्-llama
wandb_log_model: 

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

#gpu_memory_limit: 20GiB
#lora_on_cpu: true         

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
special_tokens:
   pad_token: <|end_of_text|>

```

</details><br>

## License
This model is released under the Apache 2.0 license.