Virtuoso-Lite Logo

Virtuoso-Lite (10B) is our next-generation, 10-billion-parameter language model based on the Llama-3 architecture. It is distilled from Deepseek-v3 using ~1.1B tokens/logits, allowing it to achieve robust performance at a significantly reduced parameter count compared to larger models. Despite its compact size, Virtuoso-Lite excels in a variety of tasks, demonstrating advanced reasoning, code generation, and mathematical problem-solving capabilities.

Model Details

  • Architecture Base: Falcon-10B (based on Llama-3)
  • Parameter Count: 10B
  • Tokenizer:
    • Initially integrated with Deepseek-v3 tokenizer for logit extraction.
    • Final alignment uses the Llama-3 tokenizer, with specialized “tokenizer surgery” for cross-architecture compatibility.
  • Distillation Data:
    • ~1.1B tokens/logits from Deepseek-v3’s training data.
    • Logit-level distillation using a proprietary “fusion merging” approach for maximum fidelity.
  • License: falcon-llm-license

Background on Deepseek Distillation

Deepseek-v3 serves as the teacher model, from which we capture logits across billions of tokens. Rather than standard supervised fine-tuning, Virtuoso-Lite applies a full logit-level replication to preserve the most crucial insights from the teacher. This approach enables:

  • Strong performance on technical/scientific queries
  • Enhanced code generation and debugging
  • Improved consistency in math-intensive tasks

Intended Use Cases

  • Chatbots & Virtual Assistants
  • Lightweight Enterprise Data Analysis
  • Research Prototypes & Proofs of Concept
  • STEM Educational Tools (where smaller footprint is advantageous)

Evaluations

Virtuoso-Lite Logo

How to Use

Below is a sample code snippet using transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "arcee-ai/virtuoso-lite"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Provide a concise summary of quantum entanglement."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training & Fine-Tuning

  • Initial Training: Began with Falcon-10B, optimized for large-scale text ingestion.
  • Distillation & Merging:
    • Trained on ~1.1B tokens/logits from Deepseek-v3.
    • Employed “fusion merging” to capture detailed teacher insights.
    • Final step included DPO to enhance alignment and mitigate hallucinations.
  • Future Developments: We plan to incorporate additional R1 distillations to further improve specialized performance and reduce model footprint.

Performance

Virtuoso-Lite demonstrates strong results across multiple benchmarks (e.g., BBH, MMLU-PRO, MATH), often standing its ground against models with higher parameter counts. This efficiency is largely credited to logit-level distillation, which compresses the teacher model’s capabilities into a more parameter-friendly package.

Limitations

  • Context Length: 128k Tokens (may vary depending on the final tokenizer settings and system resources).
  • Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.

Ethical Considerations

  • Content Generation Risks: Like any language model, Virtuoso-Lite can generate potentially harmful or biased content if prompted in certain ways.

License

Virtuoso-Lite (10B) is released under the falcon-llm-license License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license.

If you have questions or would like to share your experiences using Virtuoso-Lite (10B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!

Downloads last month
791
GGUF
Model size
10.3B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for arcee-ai/Virtuoso-Lite-GGUF

Quantized
(18)
this model