π§ Adversarial 3.12B - Foundation Model
A 3.12 billion parameter foundation language model trained from scratch on adversarial conversations
π‘ What is This Model?
This is a foundation model trained from scratch on adversarial machine learning conversations. It has learned general language patterns but requires post-training to become task-aware and useful.
β οΈ IMPORTANT: This is a foundation model only. Direct use will produce limited/incoherent outputs. You need to post-train it first on your specific task, or fine-tune it for best results.
ποΈ Architecture
Model Type: LlamaForCausalLM
Hidden Size: 4096
Layers: 16
Attention Heads: 32 (GQA with 8 KV heads)
Intermediate Size: 11008
Context Length: 8192
Vocabulary Size: 34,783
Precision: BF16
Parameters: 3.12B (3,120,427,008 exact)
Model Size: ~5.9 GB
Key Features:
- Grouped Query Attention (GQA) - Faster inference
- 8K context length - Long document support
- BF16 precision - Optimal balance speed/quality
- Custom tokenizer - 34,783 vocabulary
π Training Details
Hardware & Framework
- GPUs: 2x NVIDIA H200 (141GB VRAM each)
- Framework: DeepSpeed ZeRO-2 + Gradient Checkpointing
- Batch Size: 64 effective (8 per GPU Γ 2 GPUs Γ 4 accumulation)
- Training Time: ~1h 48min
- Precision: BF16 mixed precision
Training Configuration
Sequence Length: 2048 tokens
Epochs: 3
Learning Rate: 3e-4 (cosine schedule, 100 steps warmup)
Optimizer: AdamW (weight decay: 0.01)
Precision: BF16
Training Loss
Initial Loss: ~10.26 (random weights)
Final Loss: ~0.66 (converged)
Average Loss: 2.18
Improvement: 94% reduction
Dataset
Trained on pedagogical conversations focused on:
- Programming errors and debugging
- Adversarial ML patterns
- Common logical mistakes
- Code explanation and analysis
Languages: Primarily French with some English
π How to Use This Model
Step 1: Post-Training (REQUIRED)
This foundation model needs post-training before it can be useful. You can either:
Option A: Post-train on your task-specific dataset (Recommended)
from transformers import Trainer, TrainingArguments, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"Pacific-Prime/adversarial_3.12b",
torch_dtype="bfloat16"
)
# Post-train on your dataset
training_args = TrainingArguments(
output_dir="./post_trained_model",
num_train_epochs=2-3,
learning_rate=1e-5,
bf16=True
)
trainer = Trainer(model=model, args=training_args, train_dataset=your_dataset)
trainer.train()
Option B: Fine-tune directly (Similar to post-training but with task-specific focus)
Step 2: Inference (After Post-Training)
Once post-trained, you can use the model:
Quick Start with vLLM (Recommended)
from vllm import LLM, SamplingParams
# Initialize with your post-trained model
llm = LLM(
model="./post_trained_model", # Your post-trained model path
dtype="bfloat16",
max_model_len=2048,
gpu_memory_utilization=0.9
)
# Generate
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=200
)
prompts = [
"<USER> Qu'est-ce qu'une erreur de logique en programmation ?<ASSISTANT>"
]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)
Using Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load your post-trained model
tokenizer = AutoTokenizer.from_pretrained("./post_trained_model")
model = AutoModelForCausalLM.from_pretrained(
"./post_trained_model",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate
messages = [
{"role": "user", "content": "Explique la rΓ©cursion avec un exemple"}
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=200,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
π§ Advanced Post-Training / Fine-tuning
For more detailed post-training:
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer
)
from datasets import load_dataset
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Pacific-Prime/adversarial_3.12b",
torch_dtype="bfloat16"
)
tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/adversarial_3.12b")
# Load your dataset
dataset = load_dataset("your-dataset")
# Configure training
training_args = TrainingArguments(
output_dir="./finetuned-model",
num_train_epochs=2,
per_device_train_batch_size=4,
gradient_accumulation_steps=8,
learning_rate=1e-5, # Lower LR for fine-tuning
bf16=True,
save_strategy="epoch"
)
# Train
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
tokenizer=tokenizer
)
trainer.train()
π¬ Chat Template
The model uses a simple chat template:
<USER> {user_message}<ASSISTANT> {assistant_response}
Special tokens:
<USER>: User message<ASSISTANT>: Assistant message<SYSTEM>: System message<BOS>: Beginning of sequence<EOS>: End of sequence
β‘ Performance
Inference Speed (vLLM on H200)
- Throughput: ~1,500 tokens/s
- Latency: ~3.5s per request (200 tokens)
- VRAM Usage: 5.8 GB
- Max Batch Size: Up to 946 concurrent requests
Model Specifications
| Metric | Value |
|---|---|
| Parameters | 3.12B |
| Model Size (BF16) | 5.9 GB |
| Context Window | 8192 tokens |
| Vocabulary | 34,783 tokens |
π― Use Cases (After Post-Training)
Once post-trained, this model can be used for:
- Programming Education - Explaining errors to students
- Code Debugging - Identifying and fixing bugs
- Adversarial ML - Understanding adversarial patterns
- Research - Foundation for custom applications
Note: Direct use without post-training will produce poor results. Post-training is required for production use.
π Related Models
- v1 (3.12B): This model - 16 layers, 3 epochs, 1h 48min training
- v2 (3.8B): Pacific-Prime/adversarial_3.8.0 - 20 layers, 5 epochs, 12 hours training
Both are independent foundation models trained from scratch. Choose based on your needs:
- 3.12B: Smaller, faster training
- 3.8B: Larger, more training time
βοΈ System Requirements
Inference:
- GPU: 1x RTX 3090 / 4090 (24GB)
- RAM: 16 GB
- BF16 precision
Fine-tuning:
- GPU: 1x A100 (80GB) or 2x RTX 4090
- RAM: 64 GB
- Storage: 20 GB
π License
This model is released under the Apache 2.0 License.
- β Commercial use allowed
- β Modification allowed
- β Distribution allowed
- β Private use allowed
π Acknowledgments
- DeepSpeed - Efficient distributed training
- vLLM - Fast inference engine
- Hugging Face - Transformers library
Foundation model trained from scratch - ready to use! π§
- Downloads last month
- 88