Model Card for free-evo-qwen72b-v0.8
Developed by : Freewheelin AI Technical Team
2024 4th May - avg. 81.28 Open Llm Leaderboard
Metric | Value |
---|---|
Avg. | 81.28 |
ARC (25-Shot) | 79.86 |
HellaSwag (10-Shot) | 91.32 |
MMLU (5-Shot) | 78.00 |
TruthfulQA (0-shot) | 74.85 |
Winogrande (5-shot) | 87.77 |
GSM8k (5-shot) | 75.89 |
Method
- We were inspired by this Sakana project
Process
You need two models with the same architecture.
- Choose one model and fine-tune it to create a gap between the original model and the fine-tuned one. It doesn't matter whether the evaluation score is higher or lower.
- Merge the two models.
- Evaluate the merged model.
- Fine-tune a specific evaluation part of the model if you need to increase the score for that part. (It's unlikely to work as you think, but you can try it.)
- Merge the models again.
- Evaluate again.
- Keep going until the average evaluation score is higher than the original one.
That's it. Simple. You can create a framework to automate this process.
Base Architecture
- QWEN2
Base Models
- several QWEN2 based models
- Downloads last month
- 3,901
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for freewheelin/free-evo-qwen72b-v0.8-re
Space using freewheelin/free-evo-qwen72b-v0.8-re 1
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard79.860
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard91.340
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard78.000
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard74.850
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard87.770
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard75.890