lale-9b-2603

lale (Turkish for "tulip") is a Turkish instruction-following language model fine-tuned from Qwen3.5-9B. It is designed to be the best Turkish language model at its size class, with strong performance in general knowledge, reasoning, tool use, grammar, finance, and legal domains.

Model Details

Property	Value
Base model	Qwen/Qwen3.5-9B
Method	LoRA SFT (r=32, alpha=32, bf16)
Training data	118,355 Turkish instruction examples (~113M tokens)
Epochs	3
Final loss	0.282
Training time	~120 hours on 1x RTX 4090
Parameters	9.5B total, 58M trainable (0.61%)

Available Formats

Format	Size	Use case
`merged/`	18 GB	Full bf16 for further fine-tuning or vLLM serving
`gguf/lale-9b-q8_0.gguf`	8.9 GB	High quality inference with llama.cpp / Ollama
`gguf/lale-9b-q4_k_m.gguf`	5.3 GB	Fast inference on consumer hardware
`adapter/`	242 MB	LoRA adapter to apply on base Qwen3.5-9B

Training Data

The training data consists of 118,355 synthetic Turkish instruction-response pairs generated using Claude Opus 4.6 and Claude Sonnet 4.6 via AWS Bedrock, across 21 categories in 3 rounds:

Round 1 (Sonnet, 61.6K examples): general, reasoning, tool_use, tool_use_advanced, finance, legal, code, translation

Round 2 (Opus, 37.1K examples): math, math_cot, multi_turn, tool_use_mcp, distill_reasoning, conversation_persona, reasoning_v2, code_v2

Round 3 (Opus+Sonnet, 19.7K examples): multi_step_tool, grammar_drill, error_recovery, legal_terms, translation_pro

All data was filtered for format validity, length bounds, exact deduplication, and tool-use message normalization.

Benchmark Results (terazi)

Evaluated using the terazi Turkish language model benchmark suite.

lale-9b-2602 vs lale-9b-2603

Category	2602 (98K data)	2603 (118K data)	Change
core	0.511	0.516	+1.0%
common_sense	0.970	0.980	+1.0%
reading_comp	0.535	0.512	-4.3%
grammar	0.288	0.337	+17.0%
translation	0.342	0.333	-2.6%
summarization	0.421	0.417	-1.0%
tool	0.411	0.444	+8.0%
api_call	0.557	0.586	+5.2%
multi_step	0.075	0.168	+124%
param_extraction	0.506	0.482	-4.7%
error_recovery	0.229	0.215	-6.1%
fin	0.492	0.454	-7.7%
sentiment	0.744	0.592	-20.4%
numerical_reasoning	0.524	0.557	+6.3%
term_understanding	0.226	0.252	+11.5%
legal	n/a	0.376	new

Key Improvements

multi_step tool use: +124% -- from targeted R3 multi_step_tool training data
grammar: +17% -- from R3 grammar_drill exercises (vowel harmony, suffix ordering, conjugation)
tool use overall: +8% -- from additional tool_use_mcp and multi_step_tool categories
numerical_reasoning: +6.3% -- from math and math_cot data
term_understanding: +11.5% -- from legal_terms and fin_analysis data

Usage

With llama.cpp

llama-server -m lale-9b-q8_0.gguf -ngl 99 --reasoning-budget 0 -c 4096

Note: --reasoning-budget 0 disables Qwen3.5's thinking mode, which puts output in reasoning_content instead of content.

With Ollama

Create a Modelfile:

FROM ./lale-9b-q8_0.gguf
PARAMETER num_ctx 4096

ollama create lale -f Modelfile
ollama run lale

With transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
)

messages = [{"role": "user", "content": "Turkiye'nin baskenti neresidir?"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Technical Notes

Qwen3.5-9B is a unified VLM (vision-language model) with Mamba/hybrid layers. We train only the language components.
Training data includes normalized tool-use formats: tool_call/tool_result roles are remapped to standard assistant/tool, and content: null is allowed for OpenAI-style function calling messages.
LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Optimizer: AdamW 8-bit, cosine LR schedule, warmup 10%
Sample packing enabled (required patching Unsloth's VLM detection for Qwen3.5)

Limitations

Trained primarily on synthetic data from Claude models; may reflect Claude's style and biases
Context window limited to 2048 tokens during training (base model supports 128K)
Sentiment analysis regressed from 2602 (-20%) -- may need targeted data for this subcategory
Some long legal/financial prompts may exceed the trained context length

License

Apache 2.0

Citation

@misc{lale-9b-2603,
  title={lale-9b-2603: Turkish Instruction Model Distilled from Frontier Models},
  author={Selim Ozten},
  year={2026},
  url={https://huggingface.co/comarproject/lale-9b-2603}
}

Downloads last month: 30

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Model tree for comarproject/lale-9b-2603

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(85)

this model

Evaluation results

core on terazi
self-reported

0.516
tool on terazi
self-reported

0.444
fin on terazi
self-reported

0.454
legal on terazi
self-reported

0.376