metadata

base_model:
  - allura-org/Gemma-3-Glitter-12B
  - soob3123/amoral-gemma3-12B-v2-qat
  - soob3123/Veiled-Calla-12B
library_name: transformers
tags:
  - merge
  - gemma
  - text-generation
  - conversational
  - allura-org/Gemma-3-Glitter-12B
  - soob3123/amoral-gemma3-12B-v2-qat
  - soob3123/Veiled-Calla-12B
license: gemma
language:
  - en
  - pt
pipeline_tag: text-generation

🤖 gama-12b

gama-12b is a 12-billion parameter language model created through the strategic merge of multiple specialized models. This model combines the capabilities of different architectures to offer a more robust and versatile conversational experience.

📋 Overview

This model was developed using the DARE TIES (Drop And REscale with Ties-Elimination) technique, an advanced model merging methodology that allows for the efficient combination of different specializations into a single cohesive model.

🔧 Base Models Used

gama-12b is the result of merging the following models:

🛠️ Merge Tool

The merge was performed using LazyMergekit, a tool that facilitates the process of merging language models.

⚙️ Technical Configuration

Merge Parameters

models:
  - model: soob3123/amoral-gemma3-12B-v2-qat
    parameters:
      density: 0.6
      weight: 0.33

  - model: allura-org/Gemma-3-Glitter-12B
    parameters:
      density: 0.6
      weight: 0.33

  - model: soob3123/Veiled-Calla-12B
    parameters:
      density: 0.6
      weight: 0.34

merge_method: dare_ties
base_model: unsloth/gemma-3-12b-it-qat

parameters:
  normalize: true
  int8_mask: true

device: auto
dtype: float16

Technical Specifications

Architecture: Gemma-3 12B
Merge Method: DARE TIES
Precision: Float16
Quantization: QAT (Quantization Aware Training)
Normalization: Enabled
Int8 Mask: Enabled

💻 How to Use

Installing Dependencies

pip install -qU transformers accelerate torch

Basic Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

# Model configuration
model_name = "rodrigomt/gama-12b"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Prepare the message
messages = [
    {"role": "user", "content": "What is a large language model?"}
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Pipeline configuration
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Text generation
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.1
)

print(outputs[0]["generated_text"])

Advanced Usage Example

# For more granular control
inputs = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = inputs.ne(tokenizer.pad_token_id)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        attention_mask=attention_mask,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

🎯 Key Features

Versatility: Combines capabilities from multiple specialized models
Efficiency: Optimized with QAT quantization for better performance
Compatibility: Fully compatible with the Transformers library
Scalability: Supports deployment on different hardware configurations

⚠️ System Requirements

Recommended Minimums

RAM: 32GB
VRAM: 24GB (GPU)
Storage: 50GB available

Ideal Configuration

RAM: 64GB+
VRAM: 40GB+ (GPU)
GPU: A6000, A100, or higher

📝 License

This model is licensed under the Gemma License.