yeniguno/turkish-code-detector

Model Card

A lightweight binary classifier that tells whether a Turkish input string is pure/partial code (CODE) or ordinary natural language (NL).
The model is designed as a guard-rail component in LLM pipelines:
if the user prompt is classified as CODE, upstream orchestration can refuse to forward it to the LLM, apply rate limits, or route it to a different policy.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

clf = pipeline("text-classification",
               model="yeniguno/turkish-code-detector",
               tokenizer="yeniguno/turkish-code-detector")

prompt = "def faktoriyel(n):\n    return 1 if n <= 1 else n * faktoriyel(n-1)"
result = clf(prompt)
print(f"Classification: {result}\n")
# Classification: [{'label': 'CODE', 'score': 0.999995231628418}]

prompt = "Linux'un yaratıcısı kimdir, biliyor musun?"
result = clf(prompt)
print(f"Classification: {result}\n")
# Classification: [{'label': 'NL', 'score': 0.9998611211776733}]

Intended Use & Limitations

✓ Recommended	✗ Not a Good Fit
Prompt filtering in LLM stacks	Detecting specific programming languages
Pre-screening user inputs in chat	Judging code quality or style
Moderating public text fields	Detecting tiny inline code tokens in very long documents
Fast, low-latency inference (≈1 ms on GPU)	Multilingual detection outside Turkish

The classifier was trained only on Turkish text + polyglot code snippets.
Unseen languages (e.g. Japanese text) may be mis-labelled NL.
Very short ambiguous strings (e.g. "int") can be mis-labelled CODE.

Training Data

Split	Total	NL	CODE
Train	316 732	251 518	65 214
Dev	39 591	31 439	8 152
Test	39 592	31 440	8 152

Training Hyperparameters

Setting	Value
Optimiser	AdamW
Effective batch	32 (2 × 16, fp16)
LR scheduler	linear-decay, warm-up 0
Max length	256 tokens
Epochs	≤ 10 (early-stopping at 6 k steps ≈ 0.30 epoch)
Loss	*Cross-entropy with reversed* class weights** `weight_NL = 10.0` `weight_CODE = 1.0`
Label smoothing	0.1
Hardware	1 × A100 40 GB (Google Colab)

Evaluation

Split	Acc	Prec	Recall	F1
Train	0.9960	0.9978	0.9827	0.9902
Dev	0.9957	0.9981	0.9807	0.9894
Test	0.9954	0.9968	0.9807	0.9887

All metrics computed with
id2label = {0: "NL", 1: "CODE"}.