Update README.md
Browse files
README.md
CHANGED
@@ -21,17 +21,42 @@ Please, refer to the [original model card](https://huggingface.co/teknium/OpenHe
|
|
21 |
|
22 |
## Use with mlx-llm
|
23 |
|
24 |
-
|
25 |
```bash
|
26 |
git clone https://github.com/riccardomusmeci/mlx-llm
|
27 |
cd mlx-llm
|
28 |
pip install .
|
29 |
```
|
30 |
|
31 |
-
|
32 |
|
33 |
```python
|
34 |
-
from mlx_llm.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."
|
37 |
|
@@ -43,21 +68,19 @@ examples = [
|
|
43 |
},
|
44 |
{
|
45 |
"user": "What is your job?",
|
46 |
-
"model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
|
47 |
}
|
48 |
]
|
49 |
|
50 |
-
|
51 |
model_name="OpenHermes-2.5-Mistral-7B",
|
52 |
-
|
53 |
-
tokenizer="path/to/tokenizer.model",
|
54 |
personality=personality,
|
55 |
examples=examples,
|
56 |
)
|
57 |
-
|
58 |
-
|
59 |
```
|
60 |
|
61 |
-
|
62 |
|
63 |
-
mlx-llm takes care of prompt format. Just play!
|
|
|
21 |
|
22 |
## Use with mlx-llm
|
23 |
|
24 |
+
Install mlx-llm from GitHub.
|
25 |
```bash
|
26 |
git clone https://github.com/riccardomusmeci/mlx-llm
|
27 |
cd mlx-llm
|
28 |
pip install .
|
29 |
```
|
30 |
|
31 |
+
Test with simple generation
|
32 |
|
33 |
```python
|
34 |
+
from mlx_llm.model import create_model, create_tokenizer, generate
|
35 |
+
|
36 |
+
model = create_model("OpenHermes-2.5-Mistral-7B") # it downloads weights from this space
|
37 |
+
tokenizer = create_tokenizer("OpenHermes-2.5-Mistral-7B")
|
38 |
+
generate(
|
39 |
+
model=model,
|
40 |
+
tokenizer=tokenizer,
|
41 |
+
prompt="What's the meaning of life?",
|
42 |
+
max_tokens=200,
|
43 |
+
temperature=.1
|
44 |
+
)
|
45 |
+
```
|
46 |
+
|
47 |
+
Quantize the model weights
|
48 |
+
```
|
49 |
+
from mlx_llm.model import create_model, quantize, save_weights
|
50 |
+
|
51 |
+
model = create_model(model_name)
|
52 |
+
model = quantize(model, group_size=64, bits=4)
|
53 |
+
save_weights(model, "weights.npz")
|
54 |
+
```
|
55 |
+
|
56 |
+
Use it in chat mode (don't worry about the prompt, the library takes care of it.)
|
57 |
+
|
58 |
+
```python
|
59 |
+
from mlx_llm.playground.chat import ChatLLM
|
60 |
|
61 |
personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."
|
62 |
|
|
|
68 |
},
|
69 |
{
|
70 |
"user": "What is your job?",
|
71 |
+
"model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
|
72 |
}
|
73 |
]
|
74 |
|
75 |
+
chat_llm = ChatLLM.build(
|
76 |
model_name="OpenHermes-2.5-Mistral-7B",
|
77 |
+
tokenizer="mlx-community/OpenHermes-2.5-Mistral-7B", # HF tokenizer or a local path to a tokenizer
|
|
|
78 |
personality=personality,
|
79 |
examples=examples,
|
80 |
)
|
81 |
+
|
82 |
+
chat_llm.run(max_tokens=500, temp=0.1)
|
83 |
```
|
84 |
|
85 |
+
With `mlx-llm` you can also play with a simple RAG. Go check the examples.
|
86 |
|
|