agentlans
/

Llama3.2-3B-karcher-drill

Model card Files Files and versions

Llama3.2-3B-karcher-drill

A compact, versatile model designed for:

Serving as a distillation target to learn from larger models
Extracting structured data
Constructing datasets

Features

Small: Contains only 3 billion parameters, enabling efficient deployment
Diverse: Combines multiple independent general-purpose models to enhance robustness
Robust: Model weights averaged and further trained with an extremely high dropout rate of 97% to improve generalization
Precise: Training data emphasizes formatted output to enhance accuracy in structured tasks

Component Models

These models were merged using the karcher method with equal weights in mergekit.

Training Hyperparameters

Dataset: agentlans/drill (1 epoch)
Learning rate 5e-5
Pack sequences: on
Use neat packing: on
NEFTune alpha 5
LoRA rank 64, alpha 128, dropout 0.97
Use rslora

Limitations

Primarily focused on English language tasks
Not optimized for long context windows or extended chain-of-thought reasoning
Limited background knowledge with potential hallucinations, typical of small models
May struggle with complex math and logical reasoning, similar to most large language models
Not safety-tuned: neither censored nor explicitly uncensored

Licence

This model is licensed under the Llama 3.2 Community License Agreement.

Downloads last month: 3

Safetensors

Model size

3.21B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/Llama3.2-3B-karcher-drill

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

Devarui379/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated

Finetuned

(1)

this model

Dataset used to train agentlans/Llama3.2-3B-karcher-drill