Suparious commited on
Commit
b4ba5f8
·
verified ·
1 Parent(s): f6ec8cd

Adding model card

Browse files
Files changed (1) hide show
  1. README.md +113 -1
README.md CHANGED
@@ -1,3 +1,115 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: cc-by-nc-4.0
5
+ tags:
6
+ - text-generation-inference
7
+ - transformers
8
+ - quantized
9
+ - 4-bit
10
+ - AWQ
11
+ - text-generation
12
+ - autotrain_compatible
13
+ - endpoints_compatible
14
+ - chatml
15
+ - mistral
16
+ model_creator: SanjiWatsuki
17
+ model_name: Kunoichi-DPO-v2-7B
18
+ model_type: mistral
19
+ pipeline_tag: text-generation
20
+ inference: false
21
  ---
22
+ # SanjiWatsuki/Kunoichi-DPO-v2-7B AWQ
23
+
24
+ **UPLOAD IN PROGRESS**
25
+
26
+ - Model creator: [SanjiWatsuki](https://huggingface.co/SanjiWatsuki)
27
+ - Original model: [Kunoichi-DPO-v2-7B](https://huggingface.co/macadeliccc/Kunoichi-DPO-v2-7B)
28
+
29
+ ## Model Summary
30
+
31
+ | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
32
+ |---|---:|---:|---:|---:|---:|
33
+ | **Kunoichi-DPO-7B**|**58.4**| 45.08 | 74| 66.99| 47.52|
34
+ | **Kunoichi-DPO-v2-7B**|**58.31**| 44.85| 75.05| 65.69| 47.65|
35
+ | [Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)|57.54| 44.99| 74.86| 63.72| 46.58|
36
+ | [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218)| 56.85 | 44.74 | 75.6 | 59.89 | 47.17 |
37
+ | [Silicon-Maid-7B](https://huggingface.co/SanjiWatsuki/Silicon-Maid-7B) | 56.45| 44.74| 74.26| 61.5| 45.32|
38
+ | [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 |
39
+ | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
40
+ | [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5) | 51.34 | 42.67 | 72.92 | 47.27 | 42.51 |
41
+ | [berkeley-nest/Starling-LM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) | 51.16 | 42.06 | 72.72 | 47.33 | 42.53 |
42
+ | [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 50.99 | 37.33 | 71.83 | 55.1 | 39.7 |
43
+
44
+ ## How to use
45
+
46
+ ### Install the necessary packages
47
+
48
+ ```bash
49
+ pip install --upgrade autoawq autoawq-kernels
50
+ ```
51
+
52
+ ### Example Python code
53
+
54
+ ```python
55
+ from awq import AutoAWQForCausalLM
56
+ from transformers import AutoTokenizer, TextStreamer
57
+
58
+ model_path = "solidrust/Kunoichi-DPO-v2-7B-AWQ"
59
+ system_message = "You are Kunoichi, incarnated as a powerful AI."
60
+
61
+ # Load model
62
+ model = AutoAWQForCausalLM.from_quantized(model_path,
63
+ fuse_layers=True)
64
+ tokenizer = AutoTokenizer.from_pretrained(model_path,
65
+ trust_remote_code=True)
66
+ streamer = TextStreamer(tokenizer,
67
+ skip_prompt=True,
68
+ skip_special_tokens=True)
69
+
70
+ # Convert prompt to tokens
71
+ prompt_template = """\
72
+ <|im_start|>system
73
+ {system_message}<|im_end|>
74
+ <|im_start|>user
75
+ {prompt}<|im_end|>
76
+ <|im_start|>assistant"""
77
+
78
+ prompt = "You're standing on the surface of the Earth. "\
79
+ "You walk one mile south, one mile west and one mile north. "\
80
+ "You end up exactly where you started. Where are you?"
81
+
82
+ tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
83
+ return_tensors='pt').input_ids.cuda()
84
+
85
+ # Generate output
86
+ generation_output = model.generate(tokens,
87
+ streamer=streamer,
88
+ max_new_tokens=512)
89
+
90
+ ```
91
+
92
+ ### About AWQ
93
+
94
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
95
+
96
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
97
+
98
+ It is supported by:
99
+
100
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
101
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
102
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
103
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
104
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
105
+
106
+ ## Prompt template: ChatML
107
+
108
+ ```plaintext
109
+ <|im_start|>system
110
+ {system_message}<|im_end|>
111
+ <|im_start|>user
112
+ {prompt}<|im_end|>
113
+ <|im_start|>assistant
114
+ ```
115
+