Update README.md
Browse files
README.md
CHANGED
@@ -6,127 +6,16 @@ language:
|
|
6 |
- en
|
7 |
tags:
|
8 |
- llama-cpp
|
|
|
|
|
9 |
---
|
10 |
|
11 |
-
# Llama-3-
|
12 |
-
|
13 |
-

|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
-
|
18 |
-
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3)
|
19 |
-
|
20 |
-
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
|
21 |
-
|
22 |
-
## Quantization
|
23 |
-
|
24 |
-
We have prepared two quantized model options, GGUF and AWQ. This is the GGUF (Q4_K_M) model, converted using [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
25 |
-
|
26 |
-
The following table shows the performance degradation due to quantization:
|
27 |
-
|
28 |
-
| Model | ELYZA-tasks-100 GPT4 score |
|
29 |
-
| :-------------------------------- | ---: |
|
30 |
-
| [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B) | 3.655 |
|
31 |
-
| [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57 |
|
32 |
-
| [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ) | 3.39 |
|
33 |
-
|
34 |
-
|
35 |
-
## Use with llama.cpp
|
36 |
-
|
37 |
-
Install llama.cpp through brew (works on Mac and Linux):
|
38 |
-
```bash
|
39 |
-
brew install llama.cpp
|
40 |
-
```
|
41 |
-
|
42 |
-
Invoke the llama.cpp server:
|
43 |
-
```bash
|
44 |
-
$ llama-server \
|
45 |
-
--hf-repo elyza/Llama-3-ELYZA-JP-8B-GGUF \
|
46 |
-
--hf-file Llama-3-ELYZA-JP-8B-q4_k_m.gguf \
|
47 |
-
--port 8080
|
48 |
-
```
|
49 |
-
|
50 |
-
Call the API using curl:
|
51 |
-
```bash
|
52 |
-
$ curl http://localhost:8080/v1/chat/completions \
|
53 |
-
-H "Content-Type: application/json" \
|
54 |
-
-d '{
|
55 |
-
"messages": [
|
56 |
-
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
|
57 |
-
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
|
58 |
-
],
|
59 |
-
"temperature": 0.6,
|
60 |
-
"max_tokens": -1,
|
61 |
-
"stream": false
|
62 |
-
}'
|
63 |
-
```
|
64 |
-
|
65 |
-
Call the API using Python:
|
66 |
-
```python
|
67 |
-
import openai
|
68 |
-
|
69 |
-
client = openai.OpenAI(
|
70 |
-
base_url="http://localhost:8080/v1",
|
71 |
-
api_key = "dummy_api_key"
|
72 |
-
)
|
73 |
-
|
74 |
-
completion = client.chat.completions.create(
|
75 |
-
model="dummy_model_name",
|
76 |
-
messages=[
|
77 |
-
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
|
78 |
-
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
|
79 |
-
]
|
80 |
-
)
|
81 |
-
```
|
82 |
-
|
83 |
-
## Use with Desktop App
|
84 |
-
|
85 |
-
There are various desktop applications that can handle GGUF models, but here we will introduce how to use the model in the no-code environment [LM Studio](https://lmstudio.ai/).
|
86 |
-
|
87 |
-
- **Installation**: Download and install [LM Studio](https://lmstudio.ai/).
|
88 |
-
- **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
|
89 |
-
- **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM.
|
90 |
-
- **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
|
91 |
-
- **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
|
92 |
-
|
93 |
-

|
94 |
-
|
95 |
-
This demo showcases Llama-3-ELYZA-JP-8B-GGUF running smoothly on a MacBook Pro (M1 Pro), achieving an inference speed of approximately 20 tokens per second.
|
96 |
-
|
97 |
-
## Developers
|
98 |
-
|
99 |
-
Listed in alphabetical order.
|
100 |
-
|
101 |
-
- [Masato Hirakawa](https://huggingface.co/m-hirakawa)
|
102 |
-
- [Shintaro Horie](https://huggingface.co/e-mon)
|
103 |
-
- [Tomoaki Nakamura](https://huggingface.co/tyoyo)
|
104 |
-
- [Daisuke Oba](https://huggingface.co/daisuk30ba)
|
105 |
-
- [Sam Passaglia](https://huggingface.co/passaglia)
|
106 |
-
- [Akira Sasaki](https://huggingface.co/akirasasaki)
|
107 |
|
108 |
## License
|
109 |
|
110 |
-
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)
|
111 |
-
|
112 |
-
## How to Cite
|
113 |
-
|
114 |
-
```tex
|
115 |
-
@misc{elyzallama2024,
|
116 |
-
title={elyza/Llama-3-ELYZA-JP-8B},
|
117 |
-
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
|
118 |
-
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
|
119 |
-
year={2024},
|
120 |
-
}
|
121 |
-
```
|
122 |
-
|
123 |
-
## Citations
|
124 |
-
|
125 |
-
```tex
|
126 |
-
@article{llama3modelcard,
|
127 |
-
title={Llama 3 Model Card},
|
128 |
-
author={AI@Meta},
|
129 |
-
year={2024},
|
130 |
-
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
131 |
-
}
|
132 |
-
```
|
|
|
6 |
- en
|
7 |
tags:
|
8 |
- llama-cpp
|
9 |
+
base_model:
|
10 |
+
- elyza/Llama-3-ELYZA-JP-8B
|
11 |
---
|
12 |
|
13 |
+
# Llama-3-shindy-jp-8B-GGUF
|
|
|
|
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
+
Based on [elyza/Llama-3-ELYZA-JP-8B-GGUF](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
## License
|
20 |
|
21 |
+
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|