wraith-coder-7b / README.md

Update README.md

b93577a verified about 2 hours ago

10.3 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Coder-7B-Instruct
	base_model_relation: finetune
	tags:
	- code
	- coding
	- programming
	- algorithms
	- systems-programming
	- code-generation
	- complexity-analysis
	- qwen2.5
	- fine-tuned
	- vanta-research
	- vanta-research-entities
	- vanta-research-code-models
	- wraith
	model-index:
	- name: wraith-coder-7b
	results:
	- task:
	type: text-generation
	name: Code Generation
	metrics:
	- type: conciseness
	value: 62.6
	name: Response Reduction
	- type: coverage
	value: 60
	name: Complexity Analysis Coverage
	library_name: transformers
	---

	<div align="center">

	![vanta_trimmed](https://cdn-uploads.huggingface.co/production/uploads/686c460ba3fc457ad14ab6f8/hcGtMtCIizEZG_OuCvfac.png)

	<h1>VANTA Research</h1>

	<p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>

	<p>
	<a href="https://unmodeledtyler.com"><img src="https://img.shields.io/badge/Website-unmodeledtyler.com-yellow" alt="Website"/></a>
	<a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
	<a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
	</p>
	</div>

	---

	# Wraith Coder 7B

	Wraith Coder 7B is a specialized code generation model fine-tuned from Qwen2.5-Coder-7B-Instruct. Through iterative training focused on algorithmic reasoning, systems programming, and technical communication optimization, Wraith achieves superior information density while maintaining implementation correctness.

	## Model Description

	Developed by: VANTA Research
	Base Model: Qwen/Qwen2.5-Coder-7B-Instruct
	Model Type: Causal Language Model
	Language(s): English
	License: Apache 2.0
	Fine-tuned from: Qwen2.5-Coder-7B-Instruct

	### Model Architecture

	- Parameters: 7.6 billion
	- Architecture: Transformer decoder with 28 layers
	- Hidden Size: 3584
	- Attention Heads: 28 (4 key-value heads)
	- Context Length: 32,768 tokens
	- Vocabulary Size: 152,064 tokens

	## Training Methodology

	### Iterative Fine-Tuning Strategy

	Wraith Coder 7B was developed through three iterations of progressive capability enhancement:

	Iteration 1: Personality Establishment (~4,250 examples)
	- Same personality examples used on Wraith 8B from the VANTA Research Entity Series
	- Identity formation and communication style
	- Logical reasoning patterns
	- Technical terminology usage
	- Foundation for signal-dense communication

	Iteration 2: Coding Restoration/Enhancement (~5,500 examples)
	- Conversational coding examples
	- Computer science fundamentals
	- Mathematical reasoning problems
	- Identity reinforcement examples
	- Technical communication patterns

	Iteration 3: Advanced Capabilities (~4,450 examples)
	- Architectural design patterns
	- Algorithm design and analysis
	- Debugging techniques
	- Systems programming concepts
	- Identity anchors
	- Communication pattern reinforcement

	### Training Configuration

	- Method: Low-Rank Adaptation (LoRA)
	- Rank: 16
	- Alpha: 32
	- Dropout: 0.05
	- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Learning Rate: 5e-5
	- Batch Size: 8 (effective)
	- Epochs: 2 per iteration
	- Optimizer: AdamW 8-bit
	- Training Framework: Unsloth

	## Performance Evaluation

	### Comprehensive 20-Question Coding Assessment

	A rigorous evaluation across diverse programming challenges demonstrates measurable improvements over the base model:

	#### Response Efficiency
	- Base Model: 57,999 characters average (2,900 per question)
	- Wraith Coder: 21,686 characters average (1,084 per question)
	- Improvement: 62.6% reduction in response length while maintaining correctness

	#### Technical Analysis Coverage
	- Base Model: Complexity analysis in 40% of responses
	- Wraith Coder: Complexity analysis in 60% of responses
	- Improvement: 50% increase in Big-O notation coverage

	#### Question-Specific Performance

	\| Category \| Conciseness Gain \| Key Strength \|
	\|----------\|------------------\|--------------\|
	\| Data Structures \| 80-90% \| Space complexity analysis \|
	\| Algorithms \| 75-85% \| Time complexity trade-offs \|
	\| Systems Design \| 70-80% \| Scalability considerations \|
	\| Concurrency \| 65-75% \| Synchronization patterns \|
	\| Architecture \| 50-60% \| Design pattern selection \|

	### Comparative Analysis

	Test Case: LRU Cache Implementation
	- Base Model: 120+ lines with verbose documentation
	- Wraith Coder: 45 lines with design rationale
	- Result: Equivalent correctness, 62% shorter, includes algorithmic justification

	Test Case: Rate Limiter Design
	- Base Model: 100+ lines, conceptual confusion between algorithms
	- Wraith Coder: 25 lines, correct token bucket implementation with edge case analysis
	- Result: Superior correctness and clarity

	Test Case: Binary Tree Serialization
	- Base Model: Single approach with lengthy explanation
	- Wraith Coder: Two approaches (DFS and BFS) with trade-off comparison
	- Result: Multiple solutions with selection guidance

	## Intended Use

	### Primary Applications

	Senior Software Engineering
	- Code review and optimization suggestions
	- Algorithm selection and complexity analysis
	- Systems design pattern recommendations
	- Performance optimization strategies

	Technical Interview Preparation
	- Concise algorithmic explanations
	- Multiple solution approaches
	- Time and space complexity analysis
	- Trade-off articulation

	Production Development
	- Efficient technical documentation
	- Design decision rationale
	- Scalability considerations
	- Edge case identification

	### Out-of-Scope Use

	This model is optimized for experienced developers who value information density. It may not be suitable for:
	- Beginner programming education requiring verbose step-by-step explanations
	- Non-technical audiences requiring extensive context
	- Applications requiring social conversational patterns
	- Domains outside software engineering and computer science

	## Limitations and Considerations

	### Technical Limitations

	1. Condensed Communication Style
	- Assumes reader familiarity with computer science fundamentals
	- May omit explanatory context that beginners require
	- Prioritizes technical precision over accessibility

	2. Model Size Constraints
	- 7B parameter model has inherent knowledge limitations
	- May not match larger models on extremely complex problems
	- Context window limits for very large codebases

	3. Domain Specialization
	- Optimized for algorithmic and systems programming
	- May have reduced performance on domain-specific applications (e.g., embedded systems, game engines)
	- Training data focused on general-purpose programming

	### Deployment Considerations

	- Compute Requirements: Minimum 8GB VRAM for 4-bit quantization
	- Inference Speed: Similar to base Qwen2.5-Coder-7B
	- Quantization: Tested with 4-bit (Q4_K_M) quantization maintaining quality

	## Ethical Considerations

	### Training Data

	All training data was synthetically generated or derived from publicly available educational resources. No proprietary code or copyrighted material was used in fine-tuning.

	### Bias and Fairness

	The model inherits biases present in the base Qwen2.5-Coder-7B model. Additional fine-tuning focused on technical capabilities and communication style rather than bias mitigation.

	### Responsible Use

	Users should:
	- Validate all generated code before production deployment
	- Apply appropriate code review processes
	- Consider model outputs as suggestions requiring human verification
	- Ensure compliance with relevant licensing for generated code

	## Technical Details

	### Chat Template

	The model uses the Qwen ChatML format:

	```
	<\|im_start\|>system
	{system_message}<\|im_end\|>
	<\|im_start\|>user
	{user_message}<\|im_end\|>
	<\|im_start\|>assistant
	{assistant_message}<\|im_end\|>
	```

	### Recommended Inference Parameters

	```python
	{
	"temperature": 0.7,
	"top_p": 0.9,
	"top_k": 40,
	"repeat_penalty": 1.1,
	"max_tokens": 2048
	}
	```

	### Quantization Support

	Tested and validated quantization formats:
	- FP16: Full precision baseline
	- Q8_0: Minimal quality loss
	- Q4_K_M: Recommended balance (4.4GB)
	- Q4_0: Maximum compression

	## Usage Example

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "vanta-research/wraith-coder-7b"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	messages = [
	{"role": "system", "content": "You are a helpful coding assistant."},
	{"role": "user", "content": "Implement quicksort with complexity analysis."}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Contact

	For questions or issues regarding this model, please open an issue in the model repository.

	## Citation

	If you use this model in your research or applications, please cite:

	```bibtex
	@misc{wraith-coder-7b,
	author = {VANTA Research},
	title = {Wraith Coder 7B: Signal-Dense Code Generation through Iterative Fine-Tuning},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/vanta-research/wraith-coder-7b}}
	}
	```

	## Acknowledgments

	This model builds upon Qwen2.5-Coder-7B-Instruct developed by Alibaba Cloud. We acknowledge their contribution to open-source language model research. Thanks to Unsloth for providing an easy-to-use training framework.

	## Version History

	- v1.0.0 (2025-11-19): Initial release with iteration 3 training complete
	- 62.6% response reduction while maintaining correctness
	- 60% complexity analysis coverage across 20-question benchmark
	- Production-ready for senior engineering applications

	---
	Proudly developed in Portland, Oregon by VANTA Research