Llama3.2-3B-karcher-drill

A compact, versatile model designed for:

  • Serving as a distillation target to learn from larger models
  • Extracting structured data
  • Constructing datasets

Features

  • Small: Contains only 3 billion parameters, enabling efficient deployment
  • Diverse: Combines multiple independent general-purpose models to enhance robustness
  • Robust: Model weights averaged and further trained with an extremely high dropout rate of 97% to improve generalization
  • Precise: Training data emphasizes formatted output to enhance accuracy in structured tasks

Component Models

These models were merged using the karcher method with equal weights in mergekit.

  1. lunahr/Hermes-3-Llama-3.2-3B-abliterated
  2. cognitivecomputations/Dolphin3.0-Llama3.2-3B
  3. ValiantLabs/Llama3.2-3B-ShiningValiant2
  4. bunnycore/Llama-3.2-3B-Apex
  5. nidum/Nidum-Llama-3.2-3B-Uncensored
  6. Devarui379/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated

Training Hyperparameters

  • Dataset: agentlans/drill (1 epoch)
  • Learning rate 5e-5
  • Pack sequences: on
  • Use neat packing: on
  • NEFTune alpha 5
  • LoRA rank 64, alpha 128, dropout 0.97
  • Use rslora

Limitations

  • Primarily focused on English language tasks
  • Not optimized for long context windows or extended chain-of-thought reasoning
  • Limited background knowledge with potential hallucinations, typical of small models
  • May struggle with complex math and logical reasoning, similar to most large language models
  • Not safety-tuned: neither censored nor explicitly uncensored

Licence

This model is licensed under the Llama 3.2 Community License Agreement.

Downloads last month
3
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/Llama3.2-3B-karcher-drill

Dataset used to train agentlans/Llama3.2-3B-karcher-drill