ubitus
/

Lilith-70B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions

Lilith-70B-Instruct / README.md

AIDevteam's picture

Update README.md

b8c75b2 verified 17 days ago

|

3.41 kB

	---
	base_model:
	- meta-llama/Llama-3.3-70B-Instruct
	license: llama3.3
	language:
	- zh
	- en
	library_name: transformers
	---

	# Overview

	This model is a fine-tuned version of LLaMA 3.3 70B, optimized for multilingual benchmarks including TMMlu+, TMlu, and MMLU. The fine-tuning process focused on enhancing reasoning, comprehension, and domain-specific performance. This model was developed as part of an iterative pipeline leveraging large-scale datasets and Chain-of-Thought (CoT) methodologies.

	---
	# Key Features

	• Base Model: LLaMA 3.3 70B
	• Dataset Sources: Custom-generated using LLMs, focused on high-quality, multilingual tasks.
	• Chain-of-Thought Fine-Tuning: Enhanced logical reasoning with curated datasets.

	# Data Preparation

	1. Custom Dataset Generation
	2. Traditional Chinese Data Filtering


	# Evaluation
	Please checkout [Open TW LLM Leaderboard](https://huggingface.co/spaces/yentinglin/open-tw-llm-leaderboard) for full and updated list.
	\| Model \| TMMLU+ \| TMLU \| Function Calling \|
	\| :---------------------------------------------------------- \| :-------- \| :---------------------- \| :--------------- \|
	\| [ubitus/Lilith-70B-Instruct](https://huggingface.co/ubitus/Lilith-70B-Instruct) \| 76.06% \| 73.70% \| ✅ \|
	\| [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) \| 67.53% \| 74.76% \| ✅ \|
	\| [Qwen1.5-110B-Chat](https://huggingface.co/Qwen/Qwen1.5-110B-Chat) \| 65.81% \| 75.69% \| ✅ \|
	\| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) \| 64.10% \| 73.59% \| ✅ \|
	\| [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) \| 62.75% \| 70.95% \| ✅ \|
	\| [Llama-3-Taiwan-8B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct) \| 52.28% \| 59.50% \| ✅ \|
	\| [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) \| 52.16% \| 55.57% \| ✅ \|
	\| [Gemini-1.5-Pro](https://ai.google.dev/gemini-api/docs) \| 49.92%^ \| 61.40% (5-shot) \| ✅ \|
	\| [Breexe-8x7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breexe-8x7B-Instruct-v0_1) \| 48.92% \| - \| ❓ \|
	\| [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) \| 41.77% \| 55.57% \| ❓ \|
	\| [Llama3-TAIDE-LX-8B-Chat-Alpha1](https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1) \| 39.03% \| 47.30% \| ❓ \|
	\| [Claude-3-Opus](https://www.anthropic.com/api) \| - \| 73.59% (5-shot) \| ✅ \|
	\| [GPT4-o](https://platform.openai.com/docs/api-reference/chat/create) \| - \| 65.56% (0-shot), 69.88% (5-shot) \| ✅ \|

	## This model is well-suited for:

	1. Multilingual Comprehension Tasks: Designed to handle diverse languages and formats.
	2. Domain-Specific Applications: Excels in logical reasoning and structured problem-solving.
	3. Benchmarks and Testing: An excellent choice for academic and industrial evaluations in multilingual NLP.