davzoku
/

moecule-3x1b-m6-fks

Question Answering

Model card Files Files and versions

moecule-3x1b-m6-fks / README.md

davzoku's picture

Update README.md

508dda7 verified 5 months ago

|

history blame contribute delete

3.48 kB

	---
	datasets:
	- davzoku/moecule-finqa
	- davzoku/moecule-kyc
	- davzoku/moecule-stock-market-outlook
	base_model:
	- unsloth/Llama-3.2-1B-Instruct
	pipeline_tag: question-answering
	---

	# 🫐 Moecule 3x1B M6 FKS

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/63c51d0e72db0f638ff1eb82/8BNZvdKBuSComBepbH-QW.png" width="150" height="150" alt="logo"> <br>
	</p>

	## Model Details

	This model is a mixture of experts (MoE) using the [RhuiDih/moetify](https://github.com/RhuiDih/moetify) library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at [Moecule Ingredients](https://huggingface.co/collections/davzoku/moecule-ingredients-67dac0e6210eb1d95abc6411).

	## Key Features

	- Zero Additional Training: Combine existing domain-specific / task-specific experts into a powerful MoE model without additional training!

	## System Requirements

	\| Steps \| System Requirements \|
	\| ---------------- \| -------------------- \|
	\| MoE Creation \| > 25.3 GB System RAM \|
	\| Inference (fp16) \| GPU with > 7GB VRAM \|

	## MoE Creation

	To reproduce this model, run the following command:

	```shell
	# git clone moetify fork that fixes dependency issue
	!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git

	!cd moetify && pip install -e .

	python -m moetify.mix \
	--output_dir ./moecule-3x1b-m6-fks \
	--model_path unsloth/llama-3.2-1b-Instruct \
	--modules mlp q_proj \
	--ingredients \
	davzoku/finqa_expert_1b \
	davzoku/kyc_expert_1b \
	davzoku/stock_market_expert_1b
	```

	## Model Parameters

	```shell
	INFO:root:Stem parameters: 626067456
	INFO:root:Experts parameters: 2617245696
	INFO:root:Routers parameters: 196608
	INFO:root:MOE total parameters (numel): 3243509760
	INFO:root:MOE total parameters : 3243509760
	INFO:root:MOE active parameters: 2371094528
	```

	## Inference

	To run an inference with this model, you can use the following code snippet:

	```python
	# git clone moetify fork that fixes dependency issue
	!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git

	!cd moetify && pip install -e .

	model = AutoModelForCausalLM.from_pretrained(<model-name>, device_map='auto', trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained(<model-name>)

	def format_instruction(row):
	return f"""### Question: {row}"""

	greedy_generation_config = GenerationConfig(
	temperature=0.1,
	top_p=0.75,
	top_k=40,
	num_beams=1,
	max_new_tokens=128,
	repetition_penalty=1.2
	)


	input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?"
	formatted_input = format_instruction(input_text)
	inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda')

	with torch.no_grad():
	outputs = model.generate(
	input_ids=inputs.input_ids,
	attention_mask=inputs.attention_mask,
	generation_config=greedy_generation_config
	)

	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(generated_text)
	```

	## The Team

	- CHOCK Wan Kee
	- Farlin Deva Binusha DEVASUGIN MERLISUGITHA
	- GOH Bao Sheng
	- Jessica LEK Si Jia
	- Sinha KHUSHI
	- TENG Kok Wai (Walter)

	## References

	- [Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts](https://arxiv.org/abs/2408.17280v2)
	- [RhuiDih/moetify](https://github.com/RhuiDih/moetify)