mlx-community
/

OpenHermes-2.5-Mistral-7B

Text Generation

Model card Files Files and versions Community

riccardomusmeci commited on Feb 10, 2024

Commit

e12a622

·

verified ·

1 Parent(s): 992649b

Update README.md

Files changed (1) hide show

README.md +34 -11

README.md CHANGED Viewed

@@ -21,17 +21,42 @@ Please, refer to the [original model card](https://huggingface.co/teknium/OpenHe
 ## Use with mlx-llm
-Download weights from files section and install mlx-llm from GitHub.
 ```bash
 git clone https://github.com/riccardomusmeci/mlx-llm
 cd mlx-llm
 pip install .
 ```
-Run
 ```python
-from mlx_llm.playground import LLM
 personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."
@@ -43,21 +68,19 @@ examples = [
     },
     {
         "user": "What is your job?",
-        "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager.",
     }
 ]
-llm = LLM.build(
     model_name="OpenHermes-2.5-Mistral-7B",
-    weights_path="path/to/weights.npz",
-    tokenizer="path/to/tokenizer.model",
     personality=personality,
     examples=examples,
 )
-llm.chat(max_tokens=500)
 ```
-## Prompt Format
-mlx-llm takes care of prompt format. Just play!

 ## Use with mlx-llm
+Install mlx-llm from GitHub.
 ```bash
 git clone https://github.com/riccardomusmeci/mlx-llm
 cd mlx-llm
 pip install .
 ```
+Test with simple generation
 ```python
+from mlx_llm.model import create_model, create_tokenizer, generate
+model = create_model("OpenHermes-2.5-Mistral-7B") # it downloads weights from this space
+tokenizer = create_tokenizer("OpenHermes-2.5-Mistral-7B")
+generate(
+  model=model,
+  tokenizer=tokenizer,
+  prompt="What's the meaning of life?",
+  max_tokens=200,
+  temperature=.1
+)
+```
+Quantize the model weights
+```
+from mlx_llm.model import create_model, quantize, save_weights
+model = create_model(model_name)
+model = quantize(model, group_size=64, bits=4)
+save_weights(model, "weights.npz")
+```
+Use it in chat mode (don't worry about the prompt, the library takes care of it.)
+```python
+from mlx_llm.playground.chat import ChatLLM
 personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."
     },
     {
         "user": "What is your job?",
+        "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
     }
 ]
+chat_llm = ChatLLM.build(
     model_name="OpenHermes-2.5-Mistral-7B",
+    tokenizer="mlx-community/OpenHermes-2.5-Mistral-7B", # HF tokenizer or a local path to a tokenizer
     personality=personality,
     examples=examples,
 )
+chat_llm.run(max_tokens=500, temp=0.1)
 ```
+With `mlx-llm` you can also play with a simple RAG. Go check the examples.