Llama3.2-3B-karcher-drill
A compact, versatile model designed for:
- Serving as a distillation target to learn from larger models
- Extracting structured data
- Constructing datasets
Features
- Small: Contains only 3 billion parameters, enabling efficient deployment
- Diverse: Combines multiple independent general-purpose models to enhance robustness
- Robust: Model weights averaged and further trained with an extremely high dropout rate of 97% to improve generalization
- Precise: Training data emphasizes formatted output to enhance accuracy in structured tasks
Component Models
These models were merged using the karcher
method with equal weights in mergekit.
- lunahr/Hermes-3-Llama-3.2-3B-abliterated
- cognitivecomputations/Dolphin3.0-Llama3.2-3B
- ValiantLabs/Llama3.2-3B-ShiningValiant2
- bunnycore/Llama-3.2-3B-Apex
- nidum/Nidum-Llama-3.2-3B-Uncensored
- Devarui379/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated
Training Hyperparameters
- Dataset: agentlans/drill (1 epoch)
- Learning rate 5e-5
- Pack sequences: on
- Use neat packing: on
- NEFTune alpha 5
- LoRA rank 64, alpha 128, dropout 0.97
- Use rslora
Limitations
- Primarily focused on English language tasks
- Not optimized for long context windows or extended chain-of-thought reasoning
- Limited background knowledge with potential hallucinations, typical of small models
- May struggle with complex math and logical reasoning, similar to most large language models
- Not safety-tuned: neither censored nor explicitly uncensored
Licence
This model is licensed under the Llama 3.2 Community License Agreement.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for agentlans/Llama3.2-3B-karcher-drill
Base model
meta-llama/Llama-3.2-3B-Instruct