qingy2024
/

Utility-19B-MoE

Model card Files Files and versions

qingy2024 commited on Jan 18

Commit

d6ea606

·

verified ·

1 Parent(s): 8eb0340

Update README.md

Files changed (1) hide show

README.md +39 -1

README.md CHANGED Viewed

@@ -12,4 +12,42 @@ This Mixture-of-Experts model is the combination of the following:
 1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
 2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
-3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)

 1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
 2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
+3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
+It is created using the following `mergekit-moe` config:
+```
+base_model: qwen_chat
+gate_mode: hidden
+dtype: bfloat16
+experts:
+  - source_model: qwen_math
+    positive_prompts:
+      - "Solve the equation"
+      - "Derive the formula"
+      - "Given the value x, solve for y"
+      - "Find a function that models this"
+      - "Find the integral of the function"
+      - "Find the first order derivative"
+      - "What is the answer to this math question"
+  - source_model: qwen_code
+    positive_prompts:
+      - "Write a python program"
+      - "Write a java program"
+      - "Write a C++ program"
+      - "Create a quicksort program"
+      - "Implement a library that does"
+      - "How can I do this in Python"
+      - "How can I do this in Java"
+      - "How can I do this in C++"
+      - "How can I do this in Javascript"
+      - "Create a website with HTML"
+shared_experts:
+  - source_model: qwen_chat
+    positive_prompts:
+      - "Hello, who are you?"
+      - "I need help with"
+      - "Can you explain"
+      - "Help me with this"
+    residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model
+```