|
--- |
|
language: |
|
- en |
|
- zh |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- Locutusque/StockQwen-2.5-7B |
|
- allknowingroger/QwenSlerp8-7B |
|
base_model: |
|
- allknowingroger/QwenSlerp8-7B |
|
- Locutusque/StockQwen-2.5-7B |
|
model-index: |
|
- name: Qwen-2.5-Aether-SlerpFusion-7B |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 62.62 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 36.01 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 24.17 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 6.49 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 11.29 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 36.96 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
|
|
**Qwen-2.5-Aether-SlerpFusion-7B** is a sophisticated model merge that combines the strengths of multiple pre-trained language models using the powerful [mergekit](https://github.com/ZeroXClem/mergekit) framework. This fusion leverages spherical linear interpolation (SLERP) to seamlessly blend architectural layers, resulting in a model that benefits from enhanced performance and versatility. |
|
|
|
## 🚀 Merged Models |
|
|
|
This model merge incorporates the following: |
|
|
|
- [**Locutusque/StockQwen-2.5-7B**](https://huggingface.co/Locutusque/StockQwen-2.5-7B): Serves as the foundational model, renowned for its robust language understanding and generation capabilities. |
|
- [**allknowingroger/QwenSlerp8-7B**](https://huggingface.co/allknowingroger/QwenSlerp8-7B): Contributes advanced task-specific fine-tuning, enhancing the model's adaptability across various applications. |
|
|
|
## 🧩 Merge Configuration |
|
|
|
The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**. This method ensures smooth transitions between the layers of both models, facilitating an optimal blend of their unique attributes: |
|
|
|
```yaml |
|
# ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B Merge Configuration |
|
slices: |
|
- sources: |
|
- model: Locutusque/StockQwen-2.5-7B |
|
layer_range: [0, 28] |
|
- model: allknowingroger/QwenSlerp8-7B |
|
layer_range: [0, 28] |
|
merge_method: slerp |
|
base_model: Locutusque/StockQwen-2.5-7B |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0, 0.5, 0.3, 0.7, 1] |
|
- filter: mlp |
|
value: [1, 0.5, 0.7, 0.3, 0] |
|
- value: 0.5 |
|
dtype: bfloat16 |
|
``` |
|
|
|
### 🔑 Key Parameters |
|
|
|
- **Self-Attention Filtering** (`self_attn`): Controls the blending extent across self-attention layers, allowing for a dynamic mix between the two source models. |
|
- **MLP Filtering** (`mlp`): Adjusts the balance within the Multi-Layer Perceptrons, fine-tuning the model’s neural network layers for optimal performance. |
|
- **Global Weight (`t.value`)**: Sets a general interpolation factor for all unspecified layers, ensuring an equal contribution from both models. |
|
- **Data Type (`dtype`)**: Utilizes `bfloat16` to maintain computational efficiency while preserving high precision. |
|
|
|
### 🗣️ Inference |
|
|
|
Below is an example of how to load and use the model for text generation: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
|
import torch |
|
|
|
# Define the model name |
|
model_name = "ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B" |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Load the model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
# Initialize the pipeline |
|
text_generator = pipeline( |
|
"text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
# Define the input prompt |
|
prompt = "Explain the significance of artificial intelligence in modern healthcare." |
|
|
|
# Generate the output |
|
outputs = text_generator( |
|
prompt, |
|
max_new_tokens=150, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_k=50, |
|
top_p=0.95 |
|
) |
|
|
|
# Print the generated text |
|
print(outputs[0]["generated_text"]) |
|
``` |
|
|
|
## 🎯 Use Case & Applications |
|
|
|
**Qwen-2.5-Aether-SlerpFusion-7B** excels in scenarios that require both robust language understanding and specialized task performance. This merged model is ideal for: |
|
|
|
- **Advanced Text Generation and Comprehension**: Crafting coherent, contextually accurate, and nuanced text for applications like content creation, summarization, and translation. |
|
- **Domain-Specific Tasks**: Enhancing performance in specialized areas such as legal document analysis, medical information processing, and technical support. |
|
- **Interactive AI Systems**: Powering conversational agents and chatbots that require both general language capabilities and task-specific expertise. |
|
|
|
## 📜 License |
|
|
|
This model is open-sourced under the **Apache-2.0 License**. |
|
|
|
## 💡 Tags |
|
|
|
- `merge` |
|
- `mergekit` |
|
- `slerp` |
|
- `Qwen` |
|
- `Locutusque/StockQwen-2.5-7B` |
|
- `allknowingroger/QwenSlerp8-7B` |
|
|
|
--- |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ZeroXClem__Qwen-2.5-Aether-SlerpFusion-7B) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. |29.59| |
|
|IFEval (0-Shot) |62.62| |
|
|BBH (3-Shot) |36.01| |
|
|MATH Lvl 5 (4-Shot)|24.17| |
|
|GPQA (0-shot) | 6.49| |
|
|MuSR (0-shot) |11.29| |
|
|MMLU-PRO (5-shot) |36.96| |
|
|
|
|