🎧 Adversarial 3.12B - Foundation Model

A 3.12 billion parameter foundation language model trained from scratch on adversarial conversations

💡 What is This Model?

This is a foundation model trained from scratch on adversarial machine learning conversations. It has learned general language patterns but requires post-training to become task-aware and useful.

⚠️ IMPORTANT: This is a foundation model only. Direct use will produce limited/incoherent outputs. You need to post-train it first on your specific task, or fine-tune it for best results.

🏗️ Architecture

Model Type: LlamaForCausalLM
Hidden Size: 4096
Layers: 16
Attention Heads: 32 (GQA with 8 KV heads)
Intermediate Size: 11008
Context Length: 8192
Vocabulary Size: 34,783
Precision: BF16
Parameters: 3.12B (3,120,427,008 exact)
Model Size: ~5.9 GB

Key Features:

Grouped Query Attention (GQA) - Faster inference
8K context length - Long document support
BF16 precision - Optimal balance speed/quality
Custom tokenizer - 34,783 vocabulary

📊 Training Details

Hardware & Framework

GPUs: 2x NVIDIA H200 (141GB VRAM each)
Framework: DeepSpeed ZeRO-2 + Gradient Checkpointing
Batch Size: 64 effective (8 per GPU × 2 GPUs × 4 accumulation)
Training Time: ~1h 48min
Precision: BF16 mixed precision

Training Configuration

Sequence Length: 2048 tokens
Epochs: 3
Learning Rate: 3e-4 (cosine schedule, 100 steps warmup)
Optimizer: AdamW (weight decay: 0.01)
Precision: BF16

Training Loss

Initial Loss: ~10.26 (random weights)
Final Loss: ~0.66 (converged)
Average Loss: 2.18
Improvement: 94% reduction

Dataset

Trained on pedagogical conversations focused on:

Programming errors and debugging
Adversarial ML patterns
Common logical mistakes
Code explanation and analysis

Languages: Primarily French with some English

🚀 How to Use This Model

Step 1: Post-Training (REQUIRED)

This foundation model needs post-training before it can be useful. You can either:

Option A: Post-train on your task-specific dataset (Recommended)

from transformers import Trainer, TrainingArguments, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/adversarial_3.12b",
    torch_dtype="bfloat16"
)

# Post-train on your dataset
training_args = TrainingArguments(
    output_dir="./post_trained_model",
    num_train_epochs=2-3,
    learning_rate=1e-5,
    bf16=True
)

trainer = Trainer(model=model, args=training_args, train_dataset=your_dataset)
trainer.train()

Option B: Fine-tune directly (Similar to post-training but with task-specific focus)

Step 2: Inference (After Post-Training)

Once post-trained, you can use the model:

Quick Start with vLLM (Recommended)

from vllm import LLM, SamplingParams

# Initialize with your post-trained model
llm = LLM(
    model="./post_trained_model",  # Your post-trained model path
    dtype="bfloat16",
    max_model_len=2048,
    gpu_memory_utilization=0.9
)

# Generate
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=200
)

prompts = [
    "<USER> Qu'est-ce qu'une erreur de logique en programmation ?<ASSISTANT>"
]

outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load your post-trained model
tokenizer = AutoTokenizer.from_pretrained("./post_trained_model")
model = AutoModelForCausalLM.from_pretrained(
    "./post_trained_model",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate
messages = [
    {"role": "user", "content": "Explique la récursion avec un exemple"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=200,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

🔧 Advanced Post-Training / Fine-tuning

For more detailed post-training:

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer
)
from datasets import load_dataset

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/adversarial_3.12b",
    torch_dtype="bfloat16"
)
tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/adversarial_3.12b")

# Load your dataset
dataset = load_dataset("your-dataset")

# Configure training
training_args = TrainingArguments(
    output_dir="./finetuned-model",
    num_train_epochs=2,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=1e-5,  # Lower LR for fine-tuning
    bf16=True,
    save_strategy="epoch"
)

# Train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer
)

trainer.train()

💬 Chat Template

The model uses a simple chat template:

<USER> {user_message}<ASSISTANT> {assistant_response}

Special tokens:

<USER>: User message
<ASSISTANT>: Assistant message
<SYSTEM>: System message
<BOS>: Beginning of sequence
<EOS>: End of sequence

⚡ Performance

Inference Speed (vLLM on H200)

Throughput: ~1,500 tokens/s
Latency: ~3.5s per request (200 tokens)
VRAM Usage: 5.8 GB
Max Batch Size: Up to 946 concurrent requests

Model Specifications

Metric	Value
Parameters	3.12B
Model Size (BF16)	5.9 GB
Context Window	8192 tokens
Vocabulary	34,783 tokens

🎯 Use Cases (After Post-Training)

Once post-trained, this model can be used for:

Programming Education - Explaining errors to students
Code Debugging - Identifying and fixing bugs
Adversarial ML - Understanding adversarial patterns
Research - Foundation for custom applications

Note: Direct use without post-training will produce poor results. Post-training is required for production use.

🔗 Related Models

v1 (3.12B): This model - 16 layers, 3 epochs, 1h 48min training
v2 (3.8B): Pacific-Prime/adversarial_3.8.0 - 20 layers, 5 epochs, 12 hours training

Both are independent foundation models trained from scratch. Choose based on your needs:

3.12B: Smaller, faster training
3.8B: Larger, more training time

⚙️ System Requirements

Inference:

GPU: 1x RTX 3090 / 4090 (24GB)
RAM: 16 GB
BF16 precision

Fine-tuning:

GPU: 1x A100 (80GB) or 2x RTX 4090
RAM: 64 GB
Storage: 20 GB

📄 License

This model is released under the Apache 2.0 License.

✅ Commercial use allowed
✅ Modification allowed
✅ Distribution allowed
✅ Private use allowed

🙏 Acknowledgments

DeepSpeed - Efficient distributed training
vLLM - Fast inference engine
Hugging Face - Transformers library

Foundation model trained from scratch - ready to use! 🎧

Downloads last month: 88

Safetensors

Model size

3B params

Tensor type

BF16