groxaxo commited on
Commit
a1f065c
·
verified ·
1 Parent(s): 737bb4c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +237 -0
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - huihui-ai/Qwen3-8B-abliterated
4
+ tags:
5
+ - qwen
6
+ - '3'
7
+ - abliterated
8
+ - gptq
9
+ - int8
10
+ ---
11
+ Model Card: groxaxo/Qwen3-8B-abliterated-GPTQ-W8A16
12
+
13
+ Model Overview
14
+
15
+ Model Name: groxaxo/Qwen3-8B-abliterated-GPTQ-W8A16
16
+ Base Model: huihui-ai/Qwen3-8B-abliterated
17
+ Description: This is a quantized version of the uncensored huihui-ai/Qwen3-8B-abliterated model, derived from Qwen/Qwen3-8B. The model has been quantized to GPTQ Int8 W8A16 for maximum inference speed on NVIDIA 3090 GPUs. Abliteration was performed using a novel, faster method to remove refusals, making this a proof-of-concept implementation for uncensored language model behavior.
18
+
19
+ Important Note: A newer version, huihui-ai/Huihui-Qwen3-8B-abliterated-v2, is available. Consider using the updated version for improved performance.
20
+
21
+ Quantization Details
22
+
23
+
24
+
25
+
26
+
27
+ Quantization Method: GPTQ Int8 W8A16
28
+
29
+
30
+
31
+ Purpose: Optimized for high-speed inference on NVIDIA 3090 GPUs, reducing memory footprint while maintaining performance.
32
+
33
+
34
+
35
+ Impact: Provides faster inference compared to the unquantized model, suitable for resource-constrained environments.
36
+
37
+
38
+
39
+ Model Size: 2.98B parameters
40
+
41
+
42
+
43
+ Tensor Types: I64, I32, F16
44
+
45
+ Usage
46
+
47
+ Using with vLLM
48
+
49
+ The model can be used with vLLM for efficient inference. Below is an example of how to set up and run the model using vLLM in Python:
50
+
51
+ from vllm import LLM, SamplingParams
52
+
53
+ # Define model ID
54
+ MODEL_ID = "groxaxo/Qwen3-8B-abliterated-GPTQ-W8A16"
55
+
56
+ # Initialize the vLLM model
57
+ llm = LLM(
58
+ model=MODEL_ID,
59
+ dtype="bfloat16", # Use bfloat16 for compatibility with GPTQ quantization
60
+ trust_remote_code=True,
61
+ quantization="gptq", # Specify GPTQ quantization
62
+ gpu_memory_utilization=0.9, # Adjust based on your GPU memory
63
+ )
64
+
65
+ # Define sampling parameters
66
+ sampling_params = SamplingParams(
67
+ temperature=0.7,
68
+ max_tokens=8192,
69
+ stop=["/exit"], # Custom stop token for interactive loop
70
+ )
71
+
72
+ # Interactive chat loop
73
+ system_prompt = "You are a helpful assistant."
74
+ messages = [{"role": "system", "content": system_prompt}]
75
+
76
+ while True:
77
+ user_input = input("User: ").strip()
78
+ if user_input.lower() == "/exit":
79
+ print("Exiting chat.")
80
+ break
81
+ if user_input.lower() == "/clear":
82
+ messages = [{"role": "system", "content": system_prompt}]
83
+ print("Chat history cleared. Starting a new conversation.")
84
+ continue
85
+ if not user_input:
86
+ print("Input cannot be empty. Please enter something.")
87
+ continue
88
+
89
+ # Append user input to messages
90
+ messages.append({"role": "user", "content": user_input})
91
+
92
+ # Format prompt for vLLM
93
+ prompt = "\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
94
+
95
+ # Generate response
96
+ outputs = llm.generate([prompt], sampling_params)
97
+ response = outputs[0].outputs[0].text.strip()
98
+
99
+ # Print and append response
100
+ print(f"Assistant: {response}")
101
+ messages.append({"role": "assistant", "content": response})
102
+
103
+ Installation Requirements
104
+
105
+ To use the model with vLLM, ensure you have vLLM installed:
106
+
107
+ pip install vllm
108
+
109
+ Notes
110
+
111
+
112
+
113
+
114
+
115
+ The model is pre-quantized to GPTQ Int8 W8A16, so specify quantization="gptq" when initializing the LLM object.
116
+
117
+
118
+
119
+ Adjust gpu_memory_utilization based on your GPU's memory capacity to avoid out-of-memory errors.
120
+
121
+
122
+
123
+ The max_tokens parameter can be increased for longer responses, but this may impact performance.
124
+
125
+
126
+
127
+ The model is not deployed by any inference provider. For provider support, contact the repository maintainers at Hugging Face.
128
+
129
+ Performance
130
+
131
+ Pass Rate for Harmful Instructions
132
+
133
+ The pass rate measures the proportion of harmful instructions that do not trigger refusals, calculated as (total - triggered_total) / total. The test set is sourced from huihui-ai/harmbench_behaviors, evaluated using TestPassed.py.
134
+
135
+ Test Results:
136
+
137
+
138
+
139
+
140
+
141
+ Model: huihui-ai/Qwen3-8B-abliterated
142
+
143
+
144
+
145
+ Passed Total: 320/320
146
+
147
+
148
+
149
+ Passed Ratio: 1.00 (100.00%)
150
+
151
+ Comparison:
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+ Model
160
+
161
+
162
+
163
+ Passed Total
164
+
165
+
166
+
167
+ Passed Ratio
168
+
169
+
170
+
171
+
172
+
173
+ Qwen3-8B
174
+
175
+
176
+
177
+ 195/320
178
+
179
+
180
+
181
+ 60.94%
182
+
183
+
184
+
185
+
186
+
187
+ Qwen3-8B-abliterated
188
+
189
+
190
+
191
+ 320/320
192
+
193
+
194
+
195
+ 100.00%
196
+
197
+ Note: The test provides a preliminary assessment. For comprehensive results, consider increasing the max_tokens value during evaluation.
198
+
199
+ Limitations
200
+
201
+
202
+
203
+
204
+
205
+ This model is a proof-of-concept with abliteration to remove refusals, which may lead to unpredictable behavior on certain inputs.
206
+
207
+
208
+
209
+ The quantization to GPTQ Int8 W8A16 may introduce minor performance trade-offs compared to the unquantized model, though optimized for speed.
210
+
211
+
212
+
213
+ Users should verify outputs for sensitive applications, as the model is uncensored and may generate harmful or inappropriate content.
214
+
215
+ References
216
+
217
+
218
+
219
+
220
+
221
+ Repository: groxaxo/Qwen3-8B-abliterated-GPTQ-W8A16
222
+
223
+
224
+
225
+ Base Model: Qwen/Qwen3-8B
226
+
227
+
228
+
229
+ Abliteration Method: remove-refusals-with-transformers
230
+
231
+
232
+
233
+ Test Set: huihui-ai/harmbench_behaviors
234
+
235
+
236
+
237
+ Newer Version: huihui-ai/Huihui-Qwen3-8B-abliterated-v2