File size: 1,762 Bytes
8eb0340
 
 
 
 
 
 
 
 
 
 
 
 
 
d6ea606
 
 
 
8137ed7
a5afb8c
d6ea606
 
 
a5afb8c
d6ea606
 
 
 
 
 
 
 
a5afb8c
d6ea606
 
 
 
 
 
 
 
 
 
 
 
a5afb8c
d6ea606
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
language:
- en
base_model:
- Qwen/Qwen2.5-Math-7B-Instruct
- rombodawg/Rombos-Coder-V2.5-Qwen-7b
- rombodawg/Rombos-LLM-V2.5-Qwen-7b
---
# Utility 19B MoE (3x7B)

This Mixture-of-Experts model is the combination of the following:

1. [rombodawg/Rombos-LLM-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b)
2. [rombodawg/Rombos-Coder-V2.5-Qwen-7b](https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-7b)
3. [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)

It is created using the following `mergekit-moe` config:

```yaml
base_model: rombodawg/Rombos-LLM-V2.5-Qwen-7b
gate_mode: hidden
dtype: bfloat16
experts:
  - source_model: Qwen/Qwen2.5-Math-7B-Instruct
    positive_prompts:
      - "Solve the equation"
      - "Derive the formula"
      - "Given the value x, solve for y"
      - "Find a function that models this"
      - "Find the integral of the function"
      - "Find the first order derivative"
      - "What is the answer to this math question"
  - source_model: rombodawg/Rombos-Coder-V2.5-Qwen-7b
    positive_prompts:
      - "Write a python program"
      - "Write a java program"
      - "Write a C++ program"
      - "Create a quicksort program"
      - "Implement a library that does"
      - "How can I do this in Python"
      - "How can I do this in Java"
      - "How can I do this in C++"
      - "How can I do this in Javascript"
      - "Create a website with HTML"
shared_experts:
  - source_model: rombodawg/Rombos-LLM-V2.5-Qwen-7b
    positive_prompts:
      - "Hello, who are you?"
      - "I need help with"
      - "Can you explain"
      - "Help me with this"
    residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model
```