Question Answering
Safetensors
moellama
custom_code
File size: 3,483 Bytes
9fe0bb9
 
 
 
 
 
 
 
 
 
508dda7
9fe0bb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
datasets:
- davzoku/moecule-finqa
- davzoku/moecule-kyc
- davzoku/moecule-stock-market-outlook
base_model:
- unsloth/Llama-3.2-1B-Instruct
pipeline_tag: question-answering
---

# 馃珢 Moecule 3x1B M6 FKS

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63c51d0e72db0f638ff1eb82/8BNZvdKBuSComBepbH-QW.png" width="150" height="150" alt="logo"> <br>
</p>

## Model Details

This model is a mixture of experts (MoE) using the [RhuiDih/moetify](https://github.com/RhuiDih/moetify) library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at [Moecule Ingredients](https://huggingface.co/collections/davzoku/moecule-ingredients-67dac0e6210eb1d95abc6411).

## Key Features

- **Zero Additional Training:** Combine existing domain-specific / task-specific experts into a powerful MoE model without additional training!

## System Requirements

| Steps            | System Requirements  |
| ---------------- | -------------------- |
| MoE Creation     | > 25.3 GB System RAM |
| Inference (fp16) | GPU with > 7GB VRAM  |

## MoE Creation

To reproduce this model, run the following command:

```shell
# git clone moetify fork that fixes dependency issue
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git

!cd moetify && pip install -e .

python -m moetify.mix \
 --output_dir ./moecule-3x1b-m6-fks \
 --model_path unsloth/llama-3.2-1b-Instruct \
 --modules mlp q_proj \
 --ingredients \
 davzoku/finqa_expert_1b \
 davzoku/kyc_expert_1b \
 davzoku/stock_market_expert_1b
```

## Model Parameters

```shell
INFO:root:Stem parameters: 626067456
INFO:root:Experts parameters: 2617245696
INFO:root:Routers parameters: 196608
INFO:root:MOE total parameters (numel): 3243509760
INFO:root:MOE total parameters : 3243509760
INFO:root:MOE active parameters: 2371094528
```

## Inference

To run an inference with this model, you can use the following code snippet:

```python
# git clone moetify fork that fixes dependency issue
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git

!cd moetify && pip install -e .

model = AutoModelForCausalLM.from_pretrained(<model-name>, device_map='auto', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(<model-name>)

def format_instruction(row):
    return f"""### Question: {row}"""

greedy_generation_config = GenerationConfig(
    temperature=0.1,
    top_p=0.75,
    top_k=40,
    num_beams=1,
    max_new_tokens=128,
    repetition_penalty=1.2
)


input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?"
formatted_input = format_instruction(input_text)
inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda')

with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        generation_config=greedy_generation_config
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## The Team

- CHOCK Wan Kee
- Farlin Deva Binusha DEVASUGIN MERLISUGITHA
- GOH Bao Sheng
- Jessica LEK Si Jia
- Sinha KHUSHI
- TENG Kok Wai (Walter)

## References

- [Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts](https://arxiv.org/abs/2408.17280v2)
- [RhuiDih/moetify](https://github.com/RhuiDih/moetify)