Zurich 14B GammaCorpus v2-1m

A Qwen 2.5 model fine-tuned on the GammaCorpus dataset

Overview

Zurich 14B GammaCorpus v2-1m is a fine-tune of Alibaba's Qwen 2.5 14B Instruct model. Zurich is designed to outperform other models that have a similar size while also showcasing GammaCorpus v2-1m.

Model Details

Base Model: Qwen/Qwen2.5-14B-Instruct
Type: Causal Language Models
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
Number of Parameters: 14.7B
Number of Paramaters (Non-Embedding): 13.1B
Number of Layers: 48
Number of Attention Heads (GQA): 40 for Q and 8 for KV

Training Details

Zurich-14B-GCv2-1m underwent fine-tuning with 1 A100 GPU for ~70 minutes and trained with the Unsloth framework. Zurich-14B-GCv2-1m was trained for 60 Epochs.

Usage

Requirements

We strongly recommend you use the latest version of the transformers package. You may install it via pip as follows:

pip install transformers

Quickstart

Here is a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents;

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rubenroy/Zurich-14B-GCv2-1m"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How tall is the Eiffel tower?"
messages = [
    {"role": "system", "content": "You are Zurich, an AI assistant built on the Qwen 2.5 14B model developed by Alibaba Cloud, and fine-tuned by Ruben Roy. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

About GammaCorpus

This model, and all Zurich models, are trained with GammaCorpus. GammaCorpus is a dataset on HuggingFace that is filled with structured and filtered multi-turn conversations. GammaCorpus has 4 version with different sizes in each. These are the following versions and sizes:

GammaCorpus v1

10k UNFILTERED
50k UNFILTERED
70k UNFILTERED

Here is a link to the GCv1 dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-v1-67935e4e52a04215f15a7a60

GammaCorpus v2

10k
50k
100k
500k
1m <-- This is the version of GammaCorpus v2 that the Zurich model you are using was trained on.
5m

Here is a link to the GCv2 dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-v2-67935e895e1259c404a579df

GammaCorpus CoT

Math 170k

Here is a link to the GC-CoT dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-cot-6795bbc950b62b1ced41d14f

GammaCorpus QA

Fact 450k

Here is a link to the GC-QA dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-qa-679857017bb3855234c1d8c7

The link to the full GammaCorpus dataset collection can be found here.

Known Limitations

Bias: We have tried our best to mitigate as much bias we can, but please be aware of the possibility that the model might generate some biased answers.

Additional Information

Licensing Information

The model is released under the Apache 2.0 License. Please refer to the license for usage rights and restrictions.

rubenroy
/

Zurich-14B-GCv2-1m