Typhoon T1 3B (Research Preview)
Typhoon T1 3B (Research Preview) is the first in a new family of open reasoning model "Typhoon T". Reasoning model is a novel type of model that think longer before giving a final answer.
Typhoon T1 3B (Research Preview) is built on top of Typhoon 2 3B Instruct. It has improved performance on challenging benchmarks like GPQA, MMLU Pro, and AI Mathematics Olympiad validation set.
Key Points
- Typhoon T1 is a new family of open reasoning models developed by SCB 10X
- Typhoon T1 3B (Research Preview), the first in the Typhoon T family, shows improved performance across challenging benchmarks compared to the original Typhoon 2 3B Instruct
- Typhoon T1 3B (Research Preview) offers a fast, low-compute requirements model, yet is capable in a variety of tasks by scaling test-time compute, enabling the model to think longer before giving a final answer. Typhoon T1 3B (Research Preview) is able to reason across domains, unlike many open reasoning models limited to mathematics and coding
- We open our recipe for data pipeline and training this model without distilling from other reasoning models
- We introduce a new thinking paradigm for reasoning models, structured thinking, where we add auxiliary tokens to help structure the thinking process of the model. This approach shows an increase in performance over a common variant of separating only thought and response parts based on our experiments
For more technical details, please visit our technical blog.
- To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include
llama-3.2
in the model name.
Performance
Model name | GSM8K (↑), 8-shot | HumanEval+ (↑), Pass@10 | GPQA (↑), 0CoT | AIME (↑) |
---|---|---|---|---|
Typhoon 2 3B Instruct | 56.63 | 66 | 27.01 | 0 |
Typhoon T1 3B (semi) | 59.59 | 68.99 | 25.89 | 0 |
Typhoon T1 3B (Research Preview) | 62.40 | 69.87 | 31.7 | 2.22 |
MMLU Pro (↑), 5-shot
Model name | Average | Math | Health | Physics | Business | Biology | Chemistry | Computer Science | Economics | Engineering | Philosophy | Other | History | Psychology | Law |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Typhoon 2 3B Instruct | 26.7 | 26.8 | 33.62 | 23.4 | 25.35 | 43.38 | 19.88 | 28.29 | 35.43 | 18.37 | 28.06 | 27.92 | 25.72 | 37.84 | 13.17 |
Typhoon T1 3B (Research Preview) | 30.65 | 30.57 | 36.19 | 27.1 | 31.69 | 50.77 | 22.17 | 31.22 | 38.86 | 21.98 | 30.66 | 32.79 | 26.51 | 43.36 | 17.26 |
Model description
- Model type: A 3B instruct decoder-only model based on Llama architecture.
- Requirement: transformers 4.46.1 or newer.
- Primary Language(s): English 🇬🇧 and Thai 🇹🇭 (based on Typhoon 2 3B Instruct. However, most of long thought training data are in English.)
- License: Llama 3.2 Community License
Usage Examples
⚠️ Please note that max_new_tokens
should be at least 512
, but is recommended at a minimum of 1,024
to provide space for complete generation.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "scb10x/llama-3.2-typhoon-t1-3b-research-preview"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "หากแปลคำว่า \"ไต้ฝุ่น\" เป็นภาษาอังกฤษ ในคำที่ถูกแปลแล้วจะมีตัวอักษร \"o\" ทั้งหมดกี่ตัว"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=1024,
eos_token_id=terminators,
do_sample=False,
temperature=0.0,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
OpenAI API-compatible Server with vLLM
pip install vllm
vllm serve scb10x/llama-3.2-typhoon-t1-3b-research-preview
# see more information at https://docs.vllm.ai/
Intended uses & limitations
While we made an effort to make our model safe, like all generative models, it may generate unsafe content in rare cases. Introducing a reasoning model paradigm may introduce some unforeseen behaviors, as model safety in the reasoning domain is a relatively new and ongoing area of research.
Follow Us
https://twitter.com/opentyphoon
Support
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 6
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 6
- total_train_batch_size: 288
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 2.0
Citation
@misc{typhoon2,
title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
year={2024},
eprint={2412.13702},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13702},
}
- Downloads last month
- 1,551