|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- SoAp9035/turkish_instructions |
|
|
language: |
|
|
- tr |
|
|
base_model: |
|
|
- google/gemma-3-270m-it-qat-q4_0-unquantized |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
# Gemma 3 270M Turkish Instructions Fine-tuned |
|
|
|
|
|
This model is a **fine-tuned version of Google Gemma 3 270M IT** trained on a **SoAp9035/turkish_instructions Dataset** using direct fine-tuning. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base model:** `google/gemma-3-270m-it-qat-q4_0-unquantized` |
|
|
- **Fine-tune dataset:** Turkish instruction-format dataset (`SoAp9035/turkish_instructions Dataset`) #Formatting Chat template for google/gemma-3-270m-it-qat-q4_0-unquantized |
|
|
- **Fine-tune type:** Direct fine-tuning (Causal LM) |
|
|
- **Precision:** Full precision / BF16 (BF16 used if GPU supports it) |
|
|
- **Max token length:** 256 |
|
|
- **Batch size:** 2 (effective batch size = 8 with gradient accumulation) |
|
|
- **Number of epochs:** 2 |
|
|
- **Optimizer:** AdamW |
|
|
- **Scheduler:** Cosine learning rate |
|
|
- **Evaluation:** Every 100 steps, best model selected based on `eval_loss` |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
```python |
|
|
|
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
MODEL_NAME = "Dbmaxwell/gemma3-270m-turkish-instructions" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
tokenizer.padding_side = "right" |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME) |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device) |
|
|
model.eval() |
|
|
|
|
|
def generate_response(prompt, max_new_tokens=200): |
|
|
formatted_prompt = f"<bos><start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n" |
|
|
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) |
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
inputs.input_ids, |
|
|
max_new_tokens=max_new_tokens, |
|
|
temperature=0.3, |
|
|
top_p=0.8, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.pad_token_id, |
|
|
eos_token_id=tokenizer.eos_token_id, |
|
|
repetition_penalty=1.2, |
|
|
no_repeat_ngram_size=3, |
|
|
) |
|
|
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
|
|
return response.split("<end_of_turn>")[0].strip() |
|
|
|
|
|
test_prompts = [ |
|
|
"Merhaba! Ben bir AI asistanım. Sana nasıl yardımcı olabilirim?", |
|
|
"Python'da for döngüsü nasıl yazılır?", |
|
|
"İstanbul Türkiye'nin en büyük şehridir. Kısa bilgi ver.", |
|
|
"Makine öğrenmesi nedir? Basit açıklama yap.", |
|
|
"5 artı 3 çarpı 2 kaçtır?", |
|
|
"Türkiye'nin başkenti neresidir?" |
|
|
] |
|
|
|
|
|
for i, prompt in enumerate(test_prompts, 1): |
|
|
print(f"\n{i} Question: {prompt}") |
|
|
print(f"Answer: {generate_response(prompt, max_new_tokens=100)}") |
|
|
print("-" * 60) |
|
|
|
|
|
``` |
|
|
|