|
--- |
|
base_model: |
|
- Qwen/Qwen3-30B-A3B-Base |
|
- Qwen/Qwen3-30B-A3B |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
# CavesOfQwen3 |
|
|
|
> Hey Hey, Model Gang, KaraWitch Here. |
|
> Have you, ever merged too deeply. |
|
> And found something 'they' don't want you to know? |
|
> "[CavesOfQwen3](https://youtu.be/o_PBfLbd3zw)", who is she? And why can't I reach her? |
|
|
|
 |
|
|
|
CavesOfQwen3 is a merge between the base model and the instruct model of Qwen3-30B-A3B (i.e. Qwen3-30B-A3B) and it's base model Qwen3-30B-A3B-base. |
|
|
|
The idea for this merge is to remove the overbaked feeling that is in the instruct while retaining the instruct within the model. |
|
I've tested the model and it seems to performs reasonably well\* |
|
|
|
\*(Ignoring the fact that it's spewing something random at the end. I suspect that's on my part in the configuration of the model on vllm or SillyTavern.) |
|
|
|
 |
|
|
|
This is a merge of pre-trained language models created using ~~mergekit.~~ |
|
This model is done with mergekitty. With a couple of code patch to add qwen3 and a `o_proj` into Qwen3 arch configuration (else vllm get's very grumpy over it.) |
|
This was me when I found out it didn't have `o_proj`. |
|
|
|
 |
|
|
|
Thankfully merging can be done on CPU. But not inference! |
|
|
|
I used `TIES`. Not because I'm lazy but because it's what I had lying around that isn't `SCE` or something else. |
|
|
|
## Merge Details |
|
### Merge Method |
|
|
|
This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) as a base. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [Qwen/Qwen3-30B-A3B-Base](https://huggingface.co/Qwen/Qwen3-30B-A3B-Base) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
models: |
|
- model: Qwen/Qwen3-30B-A3B |
|
parameters: |
|
density: 0.4 |
|
weight: 0.35 |
|
- model: Qwen/Qwen3-30B-A3B-Base |
|
parameters: |
|
density: 0.7 |
|
weight: 1 |
|
|
|
merge_method: ties |
|
base_model: Qwen/Qwen3-30B-A3B |
|
parameters: |
|
normalize: true |
|
dtype: bfloat16 |
|
``` |
|
|
|
### Disclaimer. |
|
|
|
> CavesOfQwen3 and it's creator is not affiliated with Caves Of Qud or the creator of the video linked. |
|
> The reference is intentional, but it is supposed to be taken as a light hearted joke. |
|
> There's not need to take it too deeply other than "Haha, funni name." |
|
> This disclaimer is for those who think otherwise or are overthinkers. |