🎧 Adversarial 3.12B - Foundation Model

A 3.12 billion parameter foundation language model trained from scratch on adversarial conversations

License Model Size Training Status


πŸ’‘ What is This Model?

This is a foundation model trained from scratch on adversarial machine learning conversations. It has learned general language patterns but requires post-training to become task-aware and useful.

⚠️ IMPORTANT: This is a foundation model only. Direct use will produce limited/incoherent outputs. You need to post-train it first on your specific task, or fine-tune it for best results.


πŸ—οΈ Architecture

Model Type: LlamaForCausalLM
Hidden Size: 4096
Layers: 16
Attention Heads: 32 (GQA with 8 KV heads)
Intermediate Size: 11008
Context Length: 8192
Vocabulary Size: 34,783
Precision: BF16
Parameters: 3.12B (3,120,427,008 exact)
Model Size: ~5.9 GB

Key Features:

  • Grouped Query Attention (GQA) - Faster inference
  • 8K context length - Long document support
  • BF16 precision - Optimal balance speed/quality
  • Custom tokenizer - 34,783 vocabulary

πŸ“Š Training Details

Hardware & Framework

  • GPUs: 2x NVIDIA H200 (141GB VRAM each)
  • Framework: DeepSpeed ZeRO-2 + Gradient Checkpointing
  • Batch Size: 64 effective (8 per GPU Γ— 2 GPUs Γ— 4 accumulation)
  • Training Time: ~1h 48min
  • Precision: BF16 mixed precision

Training Configuration

Sequence Length: 2048 tokens
Epochs: 3
Learning Rate: 3e-4 (cosine schedule, 100 steps warmup)
Optimizer: AdamW (weight decay: 0.01)
Precision: BF16

Training Loss

Initial Loss: ~10.26 (random weights)
Final Loss: ~0.66 (converged)
Average Loss: 2.18
Improvement: 94% reduction

Dataset

Trained on pedagogical conversations focused on:

  • Programming errors and debugging
  • Adversarial ML patterns
  • Common logical mistakes
  • Code explanation and analysis

Languages: Primarily French with some English


πŸš€ How to Use This Model

Step 1: Post-Training (REQUIRED)

This foundation model needs post-training before it can be useful. You can either:

Option A: Post-train on your task-specific dataset (Recommended)

from transformers import Trainer, TrainingArguments, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/adversarial_3.12b",
    torch_dtype="bfloat16"
)

# Post-train on your dataset
training_args = TrainingArguments(
    output_dir="./post_trained_model",
    num_train_epochs=2-3,
    learning_rate=1e-5,
    bf16=True
)

trainer = Trainer(model=model, args=training_args, train_dataset=your_dataset)
trainer.train()

Option B: Fine-tune directly (Similar to post-training but with task-specific focus)

Step 2: Inference (After Post-Training)

Once post-trained, you can use the model:

Quick Start with vLLM (Recommended)

from vllm import LLM, SamplingParams

# Initialize with your post-trained model
llm = LLM(
    model="./post_trained_model",  # Your post-trained model path
    dtype="bfloat16",
    max_model_len=2048,
    gpu_memory_utilization=0.9
)

# Generate
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=200
)

prompts = [
    "<USER> Qu'est-ce qu'une erreur de logique en programmation ?<ASSISTANT>"
]

outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load your post-trained model
tokenizer = AutoTokenizer.from_pretrained("./post_trained_model")
model = AutoModelForCausalLM.from_pretrained(
    "./post_trained_model",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate
messages = [
    {"role": "user", "content": "Explique la rΓ©cursion avec un exemple"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=200,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

πŸ”§ Advanced Post-Training / Fine-tuning

For more detailed post-training:

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer
)
from datasets import load_dataset

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/adversarial_3.12b",
    torch_dtype="bfloat16"
)
tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/adversarial_3.12b")

# Load your dataset
dataset = load_dataset("your-dataset")

# Configure training
training_args = TrainingArguments(
    output_dir="./finetuned-model",
    num_train_epochs=2,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=1e-5,  # Lower LR for fine-tuning
    bf16=True,
    save_strategy="epoch"
)

# Train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer
)

trainer.train()

πŸ’¬ Chat Template

The model uses a simple chat template:

<USER> {user_message}<ASSISTANT> {assistant_response}

Special tokens:

  • <USER>: User message
  • <ASSISTANT>: Assistant message
  • <SYSTEM>: System message
  • <BOS>: Beginning of sequence
  • <EOS>: End of sequence

⚑ Performance

Inference Speed (vLLM on H200)

  • Throughput: ~1,500 tokens/s
  • Latency: ~3.5s per request (200 tokens)
  • VRAM Usage: 5.8 GB
  • Max Batch Size: Up to 946 concurrent requests

Model Specifications

Metric Value
Parameters 3.12B
Model Size (BF16) 5.9 GB
Context Window 8192 tokens
Vocabulary 34,783 tokens

🎯 Use Cases (After Post-Training)

Once post-trained, this model can be used for:

  • Programming Education - Explaining errors to students
  • Code Debugging - Identifying and fixing bugs
  • Adversarial ML - Understanding adversarial patterns
  • Research - Foundation for custom applications

Note: Direct use without post-training will produce poor results. Post-training is required for production use.


πŸ”— Related Models

Both are independent foundation models trained from scratch. Choose based on your needs:

  • 3.12B: Smaller, faster training
  • 3.8B: Larger, more training time

βš™οΈ System Requirements

Inference:

  • GPU: 1x RTX 3090 / 4090 (24GB)
  • RAM: 16 GB
  • BF16 precision

Fine-tuning:

  • GPU: 1x A100 (80GB) or 2x RTX 4090
  • RAM: 64 GB
  • Storage: 20 GB

πŸ“„ License

This model is released under the Apache 2.0 License.

  • βœ… Commercial use allowed
  • βœ… Modification allowed
  • βœ… Distribution allowed
  • βœ… Private use allowed

πŸ™ Acknowledgments

  • DeepSpeed - Efficient distributed training
  • vLLM - Fast inference engine
  • Hugging Face - Transformers library

Foundation model trained from scratch - ready to use! 🎧

Downloads last month
88
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support