shindy-dev commited on
Commit
da41aeb
·
verified ·
1 Parent(s): dda2942

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -116
README.md CHANGED
@@ -6,127 +6,16 @@ language:
6
  - en
7
  tags:
8
  - llama-cpp
 
 
9
  ---
10
 
11
- # Llama-3-ELYZA-JP-8B-GGUF
12
-
13
- ![Llama-3-ELYZA-JP-8B-image](./key_visual.png)
14
 
15
  ## Model Description
16
 
17
- **Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/).
18
- Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3)
19
-
20
- For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
21
-
22
- ## Quantization
23
-
24
- We have prepared two quantized model options, GGUF and AWQ. This is the GGUF (Q4_K_M) model, converted using [llama.cpp](https://github.com/ggerganov/llama.cpp).
25
-
26
- The following table shows the performance degradation due to quantization:
27
-
28
- | Model | ELYZA-tasks-100 GPT4 score |
29
- | :-------------------------------- | ---: |
30
- | [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B) | 3.655 |
31
- | [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57 |
32
- | [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ) | 3.39 |
33
-
34
-
35
- ## Use with llama.cpp
36
-
37
- Install llama.cpp through brew (works on Mac and Linux):
38
- ```bash
39
- brew install llama.cpp
40
- ```
41
-
42
- Invoke the llama.cpp server:
43
- ```bash
44
- $ llama-server \
45
- --hf-repo elyza/Llama-3-ELYZA-JP-8B-GGUF \
46
- --hf-file Llama-3-ELYZA-JP-8B-q4_k_m.gguf \
47
- --port 8080
48
- ```
49
-
50
- Call the API using curl:
51
- ```bash
52
- $ curl http://localhost:8080/v1/chat/completions \
53
- -H "Content-Type: application/json" \
54
- -d '{
55
- "messages": [
56
- { "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
57
- { "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
58
- ],
59
- "temperature": 0.6,
60
- "max_tokens": -1,
61
- "stream": false
62
- }'
63
- ```
64
-
65
- Call the API using Python:
66
- ```python
67
- import openai
68
-
69
- client = openai.OpenAI(
70
- base_url="http://localhost:8080/v1",
71
- api_key = "dummy_api_key"
72
- )
73
-
74
- completion = client.chat.completions.create(
75
- model="dummy_model_name",
76
- messages=[
77
- {"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
78
- {"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
79
- ]
80
- )
81
- ```
82
-
83
- ## Use with Desktop App
84
-
85
- There are various desktop applications that can handle GGUF models, but here we will introduce how to use the model in the no-code environment [LM Studio](https://lmstudio.ai/).
86
-
87
- - **Installation**: Download and install [LM Studio](https://lmstudio.ai/).
88
- - **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
89
- - **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM.
90
- - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
91
- - **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
92
-
93
- ![lmstudio-demo](./lmstudio-demo.gif)
94
-
95
- This demo showcases Llama-3-ELYZA-JP-8B-GGUF running smoothly on a MacBook Pro (M1 Pro), achieving an inference speed of approximately 20 tokens per second.
96
-
97
- ## Developers
98
-
99
- Listed in alphabetical order.
100
-
101
- - [Masato Hirakawa](https://huggingface.co/m-hirakawa)
102
- - [Shintaro Horie](https://huggingface.co/e-mon)
103
- - [Tomoaki Nakamura](https://huggingface.co/tyoyo)
104
- - [Daisuke Oba](https://huggingface.co/daisuk30ba)
105
- - [Sam Passaglia](https://huggingface.co/passaglia)
106
- - [Akira Sasaki](https://huggingface.co/akirasasaki)
107
 
108
  ## License
109
 
110
- [Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)
111
-
112
- ## How to Cite
113
-
114
- ```tex
115
- @misc{elyzallama2024,
116
- title={elyza/Llama-3-ELYZA-JP-8B},
117
- url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
118
- author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
119
- year={2024},
120
- }
121
- ```
122
-
123
- ## Citations
124
-
125
- ```tex
126
- @article{llama3modelcard,
127
- title={Llama 3 Model Card},
128
- author={AI@Meta},
129
- year={2024},
130
- url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
131
- }
132
- ```
 
6
  - en
7
  tags:
8
  - llama-cpp
9
+ base_model:
10
+ - elyza/Llama-3-ELYZA-JP-8B
11
  ---
12
 
13
+ # Llama-3-shindy-jp-8B-GGUF
 
 
14
 
15
  ## Model Description
16
 
17
+ Based on [elyza/Llama-3-ELYZA-JP-8B-GGUF](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## License
20
 
21
+ [Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)