File size: 2,339 Bytes
a44526a 1947b32 a85f1f8 1947b32 8ef782e a44526a 1947b32 a44526a 64222fd a44526a 64222fd a44526a 64222fd a44526a 64222fd a44526a 64222fd a44526a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
license: mit
---
# **Phi-3.5-moe-mlx-int4**
<b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b>
This is a quantized INT4 model based on Apple MLX Framework Phi-3.5-MoE-Instruct. You can deploy it on Apple Silicon devices (M1,M2,M3).
Installation
```bash
pip install -U mlx-lm
```
Conversion
```bash
python -m mlx_lm.convert --hf-path microsoft/Phi-3.5-MoE-instruct -q
```
Samples
```python
from mlx_lm import load, generate
model, tokenizer = load("./phi-3.5-moe-mlx-int4")
sys_msg = """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:
- Blog: This tool helps you describe a certain knowledge point and content, and finally write it into Twitter or Facebook style content
- Translate: This is a tool that helps you translate into any language, using plain language as required
To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "Build Muliti Agents with MOE models" you must use the calculator tool like so:
{
"tool_name": "Blog",
"input": "Build Muliti Agents with MOE models"
}
Or to translate the question "can you introduce yourself in Chinese" you must respond:
{
"tool_name": "Search",
"input": "can you introduce yourself in Chinese"
}
Remember just output the final result, ouput in JSON format containing `"agentid"`,`"tool_name"` , `"input"` and `"output"` key-value pairs .:
[
{ "agentid": "step1",
"tool_name": "Blog",
"input": "Build Muliti Agents with MOE models",
"output": "........."
},
{ "agentid": "step2",
"tool_name": "Search",
"input": "can you introduce yourself in Chinese",
"output": "........."
},
{
"agentid": "final"
"tool_name": "Result",
"output": "........."
}
]
The users answer is as follows.
"""
query ='Write something about Generative AI with MOE , translate it to Chinese'
prompt = tokenizer.apply_chat_template(
[{"role": "system", "content": sys_msg},{"role": "user", "content": query}],
tokenize=False,
add_generation_prompt=True,
)
response = generate(model, tokenizer, prompt=prompt,max_tokens=1024, verbose=True)
```
|