qingy2024 commited on
Commit
d6ea606
·
verified ·
1 Parent(s): 8eb0340

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -12,4 +12,42 @@ This Mixture-of-Experts model is the combination of the following:
12
 
13
  1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
14
  2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
15
- 3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
14
  2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
15
+ 3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
16
+
17
+ It is created using the following `mergekit-moe` config:
18
+
19
+ ```
20
+ base_model: qwen_chat
21
+ gate_mode: hidden
22
+ dtype: bfloat16
23
+ experts:
24
+ - source_model: qwen_math
25
+ positive_prompts:
26
+ - "Solve the equation"
27
+ - "Derive the formula"
28
+ - "Given the value x, solve for y"
29
+ - "Find a function that models this"
30
+ - "Find the integral of the function"
31
+ - "Find the first order derivative"
32
+ - "What is the answer to this math question"
33
+ - source_model: qwen_code
34
+ positive_prompts:
35
+ - "Write a python program"
36
+ - "Write a java program"
37
+ - "Write a C++ program"
38
+ - "Create a quicksort program"
39
+ - "Implement a library that does"
40
+ - "How can I do this in Python"
41
+ - "How can I do this in Java"
42
+ - "How can I do this in C++"
43
+ - "How can I do this in Javascript"
44
+ - "Create a website with HTML"
45
+ shared_experts:
46
+ - source_model: qwen_chat
47
+ positive_prompts:
48
+ - "Hello, who are you?"
49
+ - "I need help with"
50
+ - "Can you explain"
51
+ - "Help me with this"
52
+ residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model
53
+ ```