|
--- |
|
license: mit |
|
--- |
|
|
|
# **Phi-3.5-moe-mlx-int4** |
|
|
|
<b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b> |
|
|
|
This is a quantized INT4 model based on Apple MLX Framework Phi-3.5-MoE-Instruct. You can deploy it on Apple Silicon devices (M1,M2,M3). |
|
|
|
|
|
Installation |
|
|
|
```bash |
|
|
|
pip install -U mlx-lm |
|
|
|
``` |
|
|
|
Conversion |
|
|
|
```bash |
|
|
|
python -m mlx_lm.convert --hf-path microsoft/Phi-3.5-MoE-instruct -q |
|
|
|
``` |
|
|
|
Samples |
|
|
|
```python |
|
|
|
from mlx_lm import load, generate |
|
|
|
model, tokenizer = load("./phi-3.5-moe-mlx-int4") |
|
|
|
sys_msg = """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you: |
|
|
|
- Blog: This tool helps you describe a certain knowledge point and content, and finally write it into Twitter or Facebook style content |
|
- Translate: This is a tool that helps you translate into any language, using plain language as required |
|
|
|
To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "Build Muliti Agents with MOE models" you must use the calculator tool like so: |
|
|
|
|
|
|
|
{ |
|
"tool_name": "Blog", |
|
"input": "Build Muliti Agents with MOE models" |
|
} |
|
|
|
|
|
|
|
Or to translate the question "can you introduce yourself in Chinese" you must respond: |
|
|
|
|
|
|
|
{ |
|
"tool_name": "Search", |
|
"input": "can you introduce yourself in Chinese" |
|
} |
|
|
|
|
|
|
|
Remember just output the final result, ouput in JSON format containing `"agentid"`,`"tool_name"` , `"input"` and `"output"` key-value pairs .: |
|
|
|
|
|
[ |
|
|
|
|
|
{ "agentid": "step1", |
|
"tool_name": "Blog", |
|
"input": "Build Muliti Agents with MOE models", |
|
"output": "........." |
|
}, |
|
|
|
{ "agentid": "step2", |
|
"tool_name": "Search", |
|
"input": "can you introduce yourself in Chinese", |
|
"output": "........." |
|
}, |
|
{ |
|
"agentid": "final" |
|
"tool_name": "Result", |
|
"output": "........." |
|
} |
|
] |
|
|
|
|
|
|
|
The users answer is as follows. |
|
""" |
|
|
|
query ='Write something about Generative AI with MOE , translate it to Chinese' |
|
|
|
prompt = tokenizer.apply_chat_template( |
|
[{"role": "system", "content": sys_msg},{"role": "user", "content": query}], |
|
tokenize=False, |
|
add_generation_prompt=True, |
|
) |
|
|
|
response = generate(model, tokenizer, prompt=prompt,max_tokens=1024, verbose=True) |
|
|
|
``` |
|
|