|
--- |
|
base_model: |
|
- meta-llama/Llama-3.3-70B-Instruct |
|
license: llama3.3 |
|
language: |
|
- zh |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
# Overview |
|
|
|
This model is a fine-tuned version of LLaMA 3.3 70B, optimized for multilingual benchmarks including TMMlu+, TMlu, and MMLU. The fine-tuning process focused on enhancing reasoning, comprehension, and domain-specific performance. This model was developed as part of an iterative pipeline leveraging large-scale datasets and Chain-of-Thought (CoT) methodologies. |
|
|
|
--- |
|
# Key Features |
|
|
|
β’ Base Model: LLaMA 3.3 70B |
|
β’ Dataset Sources: Custom-generated using LLMs, focused on high-quality, multilingual tasks. |
|
β’ Chain-of-Thought Fine-Tuning: Enhanced logical reasoning with curated datasets. |
|
|
|
# Data Preparation |
|
|
|
1. Custom Dataset Generation |
|
2. Traditional Chinese Data Filtering |
|
|
|
|
|
# Evaluation |
|
Please checkout [Open TW LLM Leaderboard](https://huggingface.co/spaces/yentinglin/open-tw-llm-leaderboard) for full and updated list. |
|
| Model | TMMLU+ | TMLU | Function Calling | |
|
| :---------------------------------------------------------- | :-------- | :---------------------- | :--------------- | |
|
| [ubitus/Lilith-70B-Instruct](https://huggingface.co/ubitus/Lilith-70B-Instruct) | **76.06%** | 73.70% | β
| |
|
| [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) | 67.53% | **74.76%** | β
| |
|
| [Qwen1.5-110B-Chat](https://huggingface.co/Qwen/Qwen1.5-110B-Chat) | 65.81% | 75.69% | β
| |
|
| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 64.10% | 73.59% | β
| |
|
| [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 62.75% | 70.95% | β
| |
|
| [Llama-3-Taiwan-8B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct) | 52.28% | 59.50% | β
| |
|
| [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | 52.16% | 55.57% | β
| |
|
| [Gemini-1.5-Pro](https://ai.google.dev/gemini-api/docs) | 49.92%^ | 61.40% (5-shot) | β
| |
|
| [Breexe-8x7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breexe-8x7B-Instruct-v0_1) | 48.92% | - | β | |
|
| [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) | 41.77% | 55.57% | β | |
|
| [Llama3-TAIDE-LX-8B-Chat-Alpha1](https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1) | 39.03% | 47.30% | β | |
|
| [Claude-3-Opus](https://www.anthropic.com/api) | - | 73.59% (5-shot) | β
| |
|
| [GPT4-o](https://platform.openai.com/docs/api-reference/chat/create) | - | 65.56% (0-shot), 69.88% (5-shot) | β
| |
|
|
|
## This model is well-suited for: |
|
|
|
1. Multilingual Comprehension Tasks: Designed to handle diverse languages and formats. |
|
2. Domain-Specific Applications: Excels in logical reasoning and structured problem-solving. |
|
3. Benchmarks and Testing: An excellent choice for academic and industrial evaluations in multilingual NLP. |