lale-9b-2603

lale (Turkish for "tulip") is a Turkish instruction-following language model fine-tuned from Qwen3.5-9B. It is designed to be the best Turkish language model at its size class, with strong performance in general knowledge, reasoning, tool use, grammar, finance, and legal domains.

Model Details

Property Value
Base model Qwen/Qwen3.5-9B
Method LoRA SFT (r=32, alpha=32, bf16)
Training data 118,355 Turkish instruction examples (~113M tokens)
Epochs 3
Final loss 0.282
Training time ~120 hours on 1x RTX 4090
Parameters 9.5B total, 58M trainable (0.61%)

Available Formats

Format Size Use case
merged/ 18 GB Full bf16 for further fine-tuning or vLLM serving
gguf/lale-9b-q8_0.gguf 8.9 GB High quality inference with llama.cpp / Ollama
gguf/lale-9b-q4_k_m.gguf 5.3 GB Fast inference on consumer hardware
adapter/ 242 MB LoRA adapter to apply on base Qwen3.5-9B

Training Data

The training data consists of 118,355 synthetic Turkish instruction-response pairs generated using Claude Opus 4.6 and Claude Sonnet 4.6 via AWS Bedrock, across 21 categories in 3 rounds:

Round 1 (Sonnet, 61.6K examples): general, reasoning, tool_use, tool_use_advanced, finance, legal, code, translation

Round 2 (Opus, 37.1K examples): math, math_cot, multi_turn, tool_use_mcp, distill_reasoning, conversation_persona, reasoning_v2, code_v2

Round 3 (Opus+Sonnet, 19.7K examples): multi_step_tool, grammar_drill, error_recovery, legal_terms, translation_pro

All data was filtered for format validity, length bounds, exact deduplication, and tool-use message normalization.

Benchmark Results (terazi)

Evaluated using the terazi Turkish language model benchmark suite.

lale-9b-2602 vs lale-9b-2603

Category 2602 (98K data) 2603 (118K data) Change
core 0.511 0.516 +1.0%
common_sense 0.970 0.980 +1.0%
reading_comp 0.535 0.512 -4.3%
grammar 0.288 0.337 +17.0%
translation 0.342 0.333 -2.6%
summarization 0.421 0.417 -1.0%
tool 0.411 0.444 +8.0%
api_call 0.557 0.586 +5.2%
multi_step 0.075 0.168 +124%
param_extraction 0.506 0.482 -4.7%
error_recovery 0.229 0.215 -6.1%
fin 0.492 0.454 -7.7%
sentiment 0.744 0.592 -20.4%
numerical_reasoning 0.524 0.557 +6.3%
term_understanding 0.226 0.252 +11.5%
legal n/a 0.376 new

Key Improvements

  • multi_step tool use: +124% -- from targeted R3 multi_step_tool training data
  • grammar: +17% -- from R3 grammar_drill exercises (vowel harmony, suffix ordering, conjugation)
  • tool use overall: +8% -- from additional tool_use_mcp and multi_step_tool categories
  • numerical_reasoning: +6.3% -- from math and math_cot data
  • term_understanding: +11.5% -- from legal_terms and fin_analysis data

Usage

With llama.cpp

llama-server -m lale-9b-q8_0.gguf -ngl 99 --reasoning-budget 0 -c 4096

Note: --reasoning-budget 0 disables Qwen3.5's thinking mode, which puts output in reasoning_content instead of content.

With Ollama

Create a Modelfile:

FROM ./lale-9b-q8_0.gguf
PARAMETER num_ctx 4096
ollama create lale -f Modelfile
ollama run lale

With transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
)

messages = [{"role": "user", "content": "Turkiye'nin baskenti neresidir?"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Technical Notes

  • Qwen3.5-9B is a unified VLM (vision-language model) with Mamba/hybrid layers. We train only the language components.
  • Training data includes normalized tool-use formats: tool_call/tool_result roles are remapped to standard assistant/tool, and content: null is allowed for OpenAI-style function calling messages.
  • LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Optimizer: AdamW 8-bit, cosine LR schedule, warmup 10%
  • Sample packing enabled (required patching Unsloth's VLM detection for Qwen3.5)

Limitations

  • Trained primarily on synthetic data from Claude models; may reflect Claude's style and biases
  • Context window limited to 2048 tokens during training (base model supports 128K)
  • Sentiment analysis regressed from 2602 (-20%) -- may need targeted data for this subcategory
  • Some long legal/financial prompts may exceed the trained context length

License

Apache 2.0

Citation

@misc{lale-9b-2603,
  title={lale-9b-2603: Turkish Instruction Model Distilled from Frontier Models},
  author={Selim Ozten},
  year={2026},
  url={https://huggingface.co/comarproject/lale-9b-2603}
}
Downloads last month
30
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for comarproject/lale-9b-2603

Finetuned
Qwen/Qwen3.5-9B
Adapter
(85)
this model

Evaluation results