Update README.md
Browse files
README.md
CHANGED
@@ -12,4 +12,42 @@ This Mixture-of-Experts model is the combination of the following:
|
|
12 |
|
13 |
1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
|
14 |
2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
|
15 |
-
3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
|
14 |
2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
|
15 |
+
3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
|
16 |
+
|
17 |
+
It is created using the following `mergekit-moe` config:
|
18 |
+
|
19 |
+
```
|
20 |
+
base_model: qwen_chat
|
21 |
+
gate_mode: hidden
|
22 |
+
dtype: bfloat16
|
23 |
+
experts:
|
24 |
+
- source_model: qwen_math
|
25 |
+
positive_prompts:
|
26 |
+
- "Solve the equation"
|
27 |
+
- "Derive the formula"
|
28 |
+
- "Given the value x, solve for y"
|
29 |
+
- "Find a function that models this"
|
30 |
+
- "Find the integral of the function"
|
31 |
+
- "Find the first order derivative"
|
32 |
+
- "What is the answer to this math question"
|
33 |
+
- source_model: qwen_code
|
34 |
+
positive_prompts:
|
35 |
+
- "Write a python program"
|
36 |
+
- "Write a java program"
|
37 |
+
- "Write a C++ program"
|
38 |
+
- "Create a quicksort program"
|
39 |
+
- "Implement a library that does"
|
40 |
+
- "How can I do this in Python"
|
41 |
+
- "How can I do this in Java"
|
42 |
+
- "How can I do this in C++"
|
43 |
+
- "How can I do this in Javascript"
|
44 |
+
- "Create a website with HTML"
|
45 |
+
shared_experts:
|
46 |
+
- source_model: qwen_chat
|
47 |
+
positive_prompts:
|
48 |
+
- "Hello, who are you?"
|
49 |
+
- "I need help with"
|
50 |
+
- "Can you explain"
|
51 |
+
- "Help me with this"
|
52 |
+
residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model
|
53 |
+
```
|