Initial

36ee84d about 2 months ago

4.82 kB

	---
	base_model: bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT
	library_name: transformers
	language:
	- en
	tags:
	- code
	- codeqwen
	- chat
	- qwen
	- qwen-coder
	license: gpl-3.0
	datasets:
	- bunyaminergen/Stable-Code-Python-SFT
	pipeline_tag: text-generation
	license_link: https://huggingface.co/bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled/blob/main/LICENSE
	---

	# Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled

	The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model
	down to 1B parameters using a token-based knowledge distillation method.

	---

	### TableofContents

	- [Usage](#usage)
	- [Dataset](#dataset)
	- [Training](#training)
	- [License](#licence)
	- [Links](#links)
	- [Team](#team)
	- [Contact](#contact)
	- [Citation](#citation)

	---

	### Usage

	#### Hugging Face

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
	tokenize = AutoTokenizer.from_pretrained(repo, padding_side="left")
	model = AutoModelForCausalLM.from_pretrained(
	repo,
	device_map="auto",
	torch_dtype="auto",
	).eval()

	system = "You are a senior Python developer."
	user = "Give me a Python implementation of bubble sort."

	text = f"System: {system}\nUser: {user}\nAssistant:"
	inputs = tokenize(text, return_tensors="pt").to(model.device)

	with torch.no_grad():
	out_ids = model.generate(**inputs, max_new_tokens=512)
	print(tokenize.decode(out_ids[0], skip_special_tokens=True))
	```

	---

	### Dataset

	- [bunyaminergen/Stable-Code-Python-SFT](https://huggingface.co/datasets/bunyaminergen/Stable-Code-Python-SFT)

	---

	### Training

	#### Hyperparameters

	\| Hyperparameter \| Value \|
	\|-------------------------------\|-------------------------------------------------\|
	\| Base Model \| `bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT` \|
	\| Knowledge Distillation Method \| Token based \|
	\| Task Type \| `CAUSAL_LM` \|
	\| Number of Epochs \| `11` \|
	\| Batch Size \| `12` \|
	\| Gradient Accumulation Steps \| `2` \|
	\| Effective Batch Size \| `24` (12 × 2) \|
	\| Learning Rate \| `5e-5` \|
	\| Optimizer \| `AdamW` \|
	\| Precision \| `BF16 Mixed Precision` \|
	\| Evaluation Strategy \| `epoch` \|
	\| Max Sequence Length \| `256 tokens` \|
	\| Logging Steps \| every `epoch` steps \|
	\| Save Checkpoint Steps \| every `10000` steps \|
	\| Experiment Tracking \| `MLflow` (local) \|
	\| Experiment Name \| `StudentKnowledgeDistillation` \|
	\| MLflow Run Name \| `StudentKD` \|

	#### Knowledge Distillation Configuration

	\| Parameter \| Value \|
	\|---------------------\|-------------\|
	\| Distillation Weight \| `0.3` \|
	\| Temperature \| `0.5` \|
	\| Loss Reduction \| `batchmean` \|

	#### Dataset

	- Train/Test Split: `90%/10%`
	- Random Seed: `42`
	- Train Batched: `True`
	- Eval Batched: `True`

	#### Tokenizer Configuration

	- Truncation: Enabled (`max_length=256`)
	- Masked Language Modeling (MLM): `False`

	#### Speeds, Sizes, Times

	- Total Training Time: ~7 hours
	- Checkpoint Frequency: every `10000` steps
	- Checkpoint Steps:
	- `checkpoint-10000`
	- `checkpoint-13200` (final checkpoint)

	#### Compute Infrastructure

	Hardware:

	- GPU: 1 × NVIDIA L40S (48 GB VRAM)
	- RAM: 94 GB
	- CPU: 16 vCPU

	Software:

	- OS: Ubuntu 22.04
	- Frameworks: PyTorch 2.4.0
	- CUDA Version: 12.4.1

	---

	### Licence

	- [LICENSE](LICENSE)

	---

	### Links

	- [Github](https://github.com/bunyaminergen/)
	- [Website](https://bunyaminergen.com)
	- [Linkedin](https://www.linkedin.com/in/bunyaminergen)

	---

	### Team

	- [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen)

	---

	### Contact

	- [Mail](mailto:[email protected])

	---

	### Citation

	```bibtex
	@software{ Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
	author = {Bunyamin Ergen},
	title = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
	year = {2025},
	month = {04},
	}
	```

	---