riccardomusmeci commited on
Commit
e12a622
·
verified ·
1 Parent(s): 992649b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -11
README.md CHANGED
@@ -21,17 +21,42 @@ Please, refer to the [original model card](https://huggingface.co/teknium/OpenHe
21
 
22
  ## Use with mlx-llm
23
 
24
- Download weights from files section and install mlx-llm from GitHub.
25
  ```bash
26
  git clone https://github.com/riccardomusmeci/mlx-llm
27
  cd mlx-llm
28
  pip install .
29
  ```
30
 
31
- Run
32
 
33
  ```python
34
- from mlx_llm.playground import LLM
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."
37
 
@@ -43,21 +68,19 @@ examples = [
43
  },
44
  {
45
  "user": "What is your job?",
46
- "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager.",
47
  }
48
  ]
49
 
50
- llm = LLM.build(
51
  model_name="OpenHermes-2.5-Mistral-7B",
52
- weights_path="path/to/weights.npz",
53
- tokenizer="path/to/tokenizer.model",
54
  personality=personality,
55
  examples=examples,
56
  )
57
-
58
- llm.chat(max_tokens=500)
59
  ```
60
 
61
- ## Prompt Format
62
 
63
- mlx-llm takes care of prompt format. Just play!
 
21
 
22
  ## Use with mlx-llm
23
 
24
+ Install mlx-llm from GitHub.
25
  ```bash
26
  git clone https://github.com/riccardomusmeci/mlx-llm
27
  cd mlx-llm
28
  pip install .
29
  ```
30
 
31
+ Test with simple generation
32
 
33
  ```python
34
+ from mlx_llm.model import create_model, create_tokenizer, generate
35
+
36
+ model = create_model("OpenHermes-2.5-Mistral-7B") # it downloads weights from this space
37
+ tokenizer = create_tokenizer("OpenHermes-2.5-Mistral-7B")
38
+ generate(
39
+ model=model,
40
+ tokenizer=tokenizer,
41
+ prompt="What's the meaning of life?",
42
+ max_tokens=200,
43
+ temperature=.1
44
+ )
45
+ ```
46
+
47
+ Quantize the model weights
48
+ ```
49
+ from mlx_llm.model import create_model, quantize, save_weights
50
+
51
+ model = create_model(model_name)
52
+ model = quantize(model, group_size=64, bits=4)
53
+ save_weights(model, "weights.npz")
54
+ ```
55
+
56
+ Use it in chat mode (don't worry about the prompt, the library takes care of it.)
57
+
58
+ ```python
59
+ from mlx_llm.playground.chat import ChatLLM
60
 
61
  personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."
62
 
 
68
  },
69
  {
70
  "user": "What is your job?",
71
+ "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
72
  }
73
  ]
74
 
75
+ chat_llm = ChatLLM.build(
76
  model_name="OpenHermes-2.5-Mistral-7B",
77
+ tokenizer="mlx-community/OpenHermes-2.5-Mistral-7B", # HF tokenizer or a local path to a tokenizer
 
78
  personality=personality,
79
  examples=examples,
80
  )
81
+
82
+ chat_llm.run(max_tokens=500, temp=0.1)
83
  ```
84
 
85
+ With `mlx-llm` you can also play with a simple RAG. Go check the examples.
86