
Table of Contents
TL;DR
Model Details
Model Description
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only
- Architecture: Hybrid Transformers + Mamba architecture
- Language(s) (NLP): English, Multilingual
- License: Falcon-LLM License
Training details
For more details about the training protocol of this model, please refer to the Falcon-H1 technical blogpost.
Usage
Currently to use this model you can either rely on Hugging Face transformers
, vLLM
or our custom fork of llama.cpp
library.
Inference
Make sure to install the latest version of transformers
or vllm
, eventually install these packages from source:
pip install git+https://github.com/huggingface/transformers.git
Refer to the official vLLM documentation for more details on building vLLM from source.
π€ transformers
Refer to the snippet below to run H1 models using π€ transformers:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-H1-1B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Perform text generation
vLLM
For vLLM, simply start a server by executing the command below:
# pip install vllm
vllm serve tiiuae/Falcon-H1-1B-Instruct --tensor-parallel-size 2 --data-parallel-size 1
π¦ llama.cpp
While we are working on integrating our architecture directly into llama.cpp
library, you can install our fork of the library and use it directly: https://github.com/tiiuae/llama.cpp-Falcon-H1
Use the same installing guidelines as llama.cpp
.
Evaluation
Falcon-H1 series perform very well on a variety of tasks, including reasoning tasks.
Tasks | Falcon-H1-7B | Qwen3-8B | Qwen2.5-7B | Gemma3-12B | Llama3.1-8B | Falcon3-7B | Falcon3-10B |
---|---|---|---|---|---|---|---|
General | |||||||
BBH | 62.28 | 47.47 | 53.76 | 63.36 | 48.58 | 52.12 | 58.09 |
ARC-C | 59.98 | 42.06 | 41.38 | 51.96 | 52.39 | 54.35 | 54.44 |
TruthfulQA | 59.91 | 53.19 | 62.41 | 61.02 | 52.99 | 55.58 | 55.05 |
HellaSwag | 75.92 | 60.56 | 63.4 | 55.63 | 71.28 | 71.81 | 75.57 |
MMLU | 76.83 | 71.56 | 73.64 | 72.5 | 68.67 | 70.81 | 74.01 |
Math | |||||||
GSM8k | 81.65 | 78.92 | 71.95 | 87.49 | 82.49 | 81.05 | 85.06 |
MATH-500 | 73.4 | 83.8 | 75.8 | 86.2 | 45.8 | 69.0 | 68.6 |
AMC-23 | 56.72 | 70.78 | 53.91 | 66.88 | 22.81 | 40.0 | 45.78 |
AIME-24 | 16.04 | 28.33 | 12.29 | 22.5 | 5.42 | 8.75 | 9.79 |
AIME-25 | 13.96 | 19.17 | 9.58 | 18.75 | 0.42 | 6.25 | 5.42 |
Science | |||||||
GPQA | 36.33 | 25.84 | 31.79 | 33.98 | 32.72 | 31.21 | 33.39 |
GPQA_Diamond | 56.9 | 43.1 | 33.0 | 37.71 | 31.31 | 37.21 | 34.68 |
MMLU-Pro | 51.75 | 34.64 | 43.23 | 39.88 | 36.42 | 40.73 | 44.05 |
MMLU-stem | 77.61 | 66.89 | 69.36 | 66.54 | 59.31 | 67.43 | 70.57 |
Code | |||||||
HumanEval | 86.59 | 84.75 | 82.32 | 84.76 | 68.29 | 71.95 | 82.32 |
HumanEval+ | 81.1 | 79.27 | 73.78 | 75.61 | 61.59 | 65.85 | 75.0 |
MBPP | 80.69 | 71.96 | 79.63 | 85.71 | 68.25 | 77.25 | 73.28 |
MBPP+ | 68.78 | 62.7 | 68.25 | 72.22 | 55.03 | 65.87 | 64.02 |
LiveCodeBench | 35.03 | 45.6 | 32.68 | 30.92 | 15.85 | 12.72 | 19.77 |
CRUXEval | 66.51 | 72.7 | 56.9 | 67.67 | 21.57 | 55.0 | 59.57 |
Instruction Following | |||||||
IFEval | 85.35 | 83.43 | 75.25 | 81.51 | 77.04 | 76.59 | 78.84 |
Alpaca-Eval | 40.23 | 46.13 | 29.48 | 43.55 | 25.48 | 27.56 | 24.31 |
MTBench | 8.85 | 8.74 | 8.45 | 8.69 | 8.29 | 8.73 | 8.46 |
LiveBench | 45.74 | 56.19 | 37.13 | 49.23 | 31.73 | 32.35 | 34.3 |
You can check more in detail on our our release blogpost, detailed benchmarks.
Useful links
- View our release blogpost.
- Feel free to join our discord server if you have any questions or to interact with our researchers and developers.
Citation
If the Falcon-H1 family of models were helpful to your work, feel free to give us a cite.
@misc{tiifalconh1,
title = {Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance},
url = {https://falcon-lm.github.io/blog/falcon-h1},
author = {Falcon-LLM Team},
month = {May},
year = {2025}
}
- Downloads last month
- 536
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit