Tri-7B

Introduction

We introduce Tri-7B, the next generation model following Trillion-7B-preview, that continues to push the boundaries of efficient training while achieving exceptional performance at the 7B parameter scale.

Average Performance vs. Approximate Training FLOPs

Key Highlights

  • Enhanced Reasoning: Modified training dataset mixture specifically optimized for reasoning capabilities
  • Advanced Post-Training: Significantly improved RL training pipeline focusing on mathematical reasoning and everyday usage
  • Extended Context: Supports up to 32K context length for long-form understanding
  • Multi-lingual: Specially optimized for Korean, English, and Japanese.

Our Tri-7B model represents a significant advancement over Trillion-7B-preview, achieving substantial performance improvements across all evaluated domains while maintaining the same efficient parameter count.

Model Specifications

Tri-7B

  • Type: Causal Language Model
  • Training Stage: Pre-training & Post-training
  • Architecture: Transformer Decoder with RoPE, SwiGLU, RMSNorm
  • Number of Parameters: 7.76B
  • Number of Layers: 32
  • Number of Attention Heads: 32
  • Context Length: 32,768
  • Vocab Size: 128,256

Quickstart

Here is a code snippet with apply_chat_template that demonstrates how to load the tokenizer and model and generate text.

Tri-7B Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "trillionlabs/Tri-7B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the concept of quantum computing in simple terms."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Tri-7B is also available with vLLM and SGLang!

# vLLM
vllm serve trillionlabs/Tri-7B --dtype bfloat16 --max-model-len 32768

# vLLM with custom options
vllm serve trillionlabs/Tri-7B \
    --dtype bfloat16 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.95 \
    --port 8000
# SGLang
python3 -m sglang.launch_server --model-path trillionlabs/Tri-7B --dtype bfloat16

# SGLang with custom options
python3 -m sglang.launch_server \
    --model-path trillionlabs/Tri-7B \
    --dtype bfloat16 \
    --context-length 32768 \
    --port 30000 \
    --host 0.0.0.0

Evaluation

We evaluated Tri-7B across a comprehensive suite of benchmarks assessing general reasoning, knowledge recall, coding abilities, mathematical reasoning, and instruction-following capabilities. Compared to our previous generation model Trillion-7B-preview, Tri-7B achieves significant gains across all domains.

Full evaluation settings
Benchmark Language Evaluation Setting Metric
General Reasoning and Factuality
โ€ข HellaSwag English 0-shot accuracy
โ€ข ARC:C English 0-shot accuracy
โ€ข HAERAE Korean 3-shot accuracy
โ€ข CLIcK Korean 0-shot accuracy
โ€ข KoBEST Korean 5-shot accuracy
Knowledge and Reasoning
โ€ข KMMLU Korean 5-shot accuracy
โ€ข MMLU English 5-shot accuracy
โ€ข Global-MMLU-Lite-ja English 5-shot accuracy
Coding
โ€ข HumanEval English 0-shot pass@1
โ€ข MBPPPlus English 0-shot pass@1
Mathematical Reasoning
โ€ข GSM8k English 0-shot, CoT exact-match
โ€ข MATH English 0-shot, CoT exact-match
โ€ข GPQA English 4-shot accuracy
โ€ข HRM8k Korean 0-shot, CoT exact-match
Instruction Following and Chat
โ€ข IFEval English 0-shot strict-average
โ€ข koIFEval Korean 0-shot strict-average
โ€ข MT-Bench English LLM-as-a-judge (gpt-4o) LLM score
โ€ข KO-MT-Bench Korean LLM-as-a-judge (gpt-4o) LLM score
โ€ข systemIFEval English 0-shot strict-average
  • *Note that koIFEval, systemIFEval, and KoRuler are our in-house evaluation benchmarks adapted for Korean to better assess model capabilities in Korean language tasks.
  • **Note that MT-Bench, KO-MT-Bench, and LogicKor use a 10-point scale.

Benchmark Results

Models compared:

  • Tri-7B (Next Generation)
  • Trillion-7B-preview (Previous Generation)

General Reasoning and Factuality

Benchmark Tri-7B Trillion-7B-preview Improvement
HellaSwag 59.52 58.94 +0.58
ARC:C 58.28 54.44 +3.84
HAERAE 82.49 80.02 +2.47
KoBEST 82.72 79.61 +3.11
CLIcK 64.43 60.41 +4.02
KMMLU 51.74 48.09 +3.65
MMLU 68.16 63.52 +4.64
Global-MMLU-Lite-ja 59.25 60.75 -1.50

Coding

Benchmark Tri-7B Trillion-7B-preview Improvement
HumanEval 53.66 55.48 -1.82
MBPPPlus 64.29 58.99 +5.30

Mathematical Reasoning

Benchmark Tri-7B Trillion-7B-preview Improvement
GSM8k 77.94 72.25 +5.69
MATH 49.40 32.70 +16.70
GPQA 34.15 32.81 +1.34
HRM8k 39.08 30.10 +8.98

Instruction Following and Chat

Benchmark Tri-7B Trillion-7B-preview Improvement
IFEval 79.26 79.13 +0.13
koIFEval 76.63 66.58 +10.05
MT-Bench 7.82 6.53 +1.29
KO-MT-Bench 7.64 6.27 +1.37
systemIFEval 66.43 27.28 +39.15

Limitations

  • Language Support: The model is optimized for English, Korean, and Japanese. Usage with other languages may result in degraded performance.
  • Knowledge Cutoff: The model's information is limited to data available up to Febuary, 2025.

License

This model is licensed under the Apache License 2.0.

Contact

For inquiries, please contact: [email protected]

Downloads last month
79
GGUF
Model size
7.53B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mykor/Tri-7B-gguf

Quantized
(3)
this model