Update README.md
Browse files
README.md
CHANGED
@@ -26,6 +26,14 @@ By leveraging the advanced instruction-following capability derived from [rinna/
|
|
26 |
demonstrating performance comparable to a reasoning model on Japanese MT-Bench—**without** requiring additional reasoning processes.
|
27 |
It follows the Qwen2.5 chat format.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
* **Model architecture**
|
30 |
|
31 |
A 64-layer, 5120-hidden-size transformer-based language model. For a comprehensive understanding of the architecture, please refer to the [Qwen2.5 Technical Report](https://arxiv.org/abs/2412.15115).
|
@@ -49,6 +57,10 @@ It follows the Qwen2.5 chat format.
|
|
49 |
- [Toshiaki Wakatsuki](https://huggingface.co/t-w)
|
50 |
- [Kei Sawada](https://huggingface.co/keisawada)
|
51 |
|
|
|
|
|
|
|
|
|
52 |
---
|
53 |
|
54 |
# Benchmarking
|
@@ -65,7 +77,7 @@ It follows the Qwen2.5 chat format.
|
|
65 |
| [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | 76.12 | 8.58 | 8.25
|
66 |
| [rinna/qwq-bakeneko-32b](https://huggingface.co/rinna/qwq-bakeneko-32b) | 78.31 | 8.81 | 8.52
|
67 |
|
68 |
-
For detailed benchmarking results, please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
|
69 |
|
70 |
---
|
71 |
|
|
|
26 |
demonstrating performance comparable to a reasoning model on Japanese MT-Bench—**without** requiring additional reasoning processes.
|
27 |
It follows the Qwen2.5 chat format.
|
28 |
|
29 |
+
| Model Type | Model Name
|
30 |
+
| :- | :-
|
31 |
+
| Japanese Continual Pre-Training Model | Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b)
|
32 |
+
| Instruction-Tuning Model | Qwen2.5 Bakeneko 32B Instruct [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct)[[AWQ]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-awq)[[GGUF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int4)
|
33 |
+
| DeepSeek R1 Distill Qwen2.5 Merged Reasoning Model | DeepSeek R1 Distill Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b)[[AWQ]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-awq)[[GGUF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gguf)[[GPTQ int8]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int4)
|
34 |
+
| QwQ Merged Reasoning Model | QwQ Bakeneko 32B [[HF]](https://huggingface.co/rinna/qwq-bakeneko-32b)[[AWQ]](https://huggingface.co/rinna/qwq-bakeneko-32b-awq)[[GGUF]](https://huggingface.co/rinna/qwq-bakeneko-32b-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwq-bakeneko-32b-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwq-bakeneko-32b-gptq-int4)
|
35 |
+
| QwQ Bakeneko Merged Instruction-Tuning Model | Qwen2.5 Bakeneko 32B Instruct V2 [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2)[[AWQ]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-awq)[[GGUF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-gptq-int4)
|
36 |
+
|
37 |
* **Model architecture**
|
38 |
|
39 |
A 64-layer, 5120-hidden-size transformer-based language model. For a comprehensive understanding of the architecture, please refer to the [Qwen2.5 Technical Report](https://arxiv.org/abs/2412.15115).
|
|
|
57 |
- [Toshiaki Wakatsuki](https://huggingface.co/t-w)
|
58 |
- [Kei Sawada](https://huggingface.co/keisawada)
|
59 |
|
60 |
+
* **Release date**
|
61 |
+
|
62 |
+
February 19, 2025
|
63 |
+
|
64 |
---
|
65 |
|
66 |
# Benchmarking
|
|
|
77 |
| [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | 76.12 | 8.58 | 8.25
|
78 |
| [rinna/qwq-bakeneko-32b](https://huggingface.co/rinna/qwq-bakeneko-32b) | 78.31 | 8.81 | 8.52
|
79 |
|
80 |
+
For detailed benchmarking results, please refer to [rinna's LM benchmark page (Sheet 20250319)](https://rinnakk.github.io/research/benchmarks/lm/index.html).
|
81 |
|
82 |
---
|
83 |
|