huihui-ai commited on
Commit
3c5eaa8
·
verified ·
1 Parent(s): 11a7ab2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +302 -0
README.md ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - Qwen/Qwen3-8B
8
+ tags:
9
+ - chat
10
+ - abliterated
11
+ - uncensored
12
+
13
+ ---
14
+
15
+ # huihui-ai/Huihui-Qwen3-8B-abliterated-v2
16
+
17
+
18
+ This is an uncensored version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
19
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
20
+
21
+ Ablation was performed using a new and faster method, which yields better results.
22
+
23
+ **Important Note** This version is an improvement over the previous one [huihui-ai/Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/Qwen3-8B-abliterated). The ollama version has also been modified.
24
+
25
+ Changed the candidate layer to eliminate the problem of garbled codes
26
+
27
+ ## ollama
28
+
29
+ You can use [huihui_ai/qwen3-abliterated:8b-v2](https://ollama.com/huihui_ai/qwen3-abliterated:8b-v2) directly, Switch the thinking toggle using /set think and /set nothink
30
+ ```
31
+ ollama run huihui_ai/qwen3-abliterated:8b-v2
32
+ ```
33
+
34
+
35
+ ## Usage
36
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
37
+
38
+
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
41
+ import torch
42
+ import os
43
+ import signal
44
+ import random
45
+ import numpy as np
46
+ import time
47
+ from collections import Counter
48
+
49
+ cpu_count = os.cpu_count()
50
+ print(f"Number of CPU cores in the system: {cpu_count}")
51
+ half_cpu_count = cpu_count // 2
52
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
53
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
54
+ torch.set_num_threads(half_cpu_count)
55
+
56
+ print(f"PyTorch threads: {torch.get_num_threads()}")
57
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
58
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
59
+
60
+ # Load the model and tokenizer
61
+ NEW_MODEL_ID = "huihui-ai/Huihui-Qwen3-8B-abliterated-v2"
62
+ print(f"Load Model {NEW_MODEL_ID} ... ")
63
+ quant_config_4 = BitsAndBytesConfig(
64
+ load_in_4bit=True,
65
+ bnb_4bit_compute_dtype=torch.bfloat16,
66
+ bnb_4bit_use_double_quant=True,
67
+ llm_int8_enable_fp32_cpu_offload=True,
68
+ )
69
+
70
+ model = AutoModelForCausalLM.from_pretrained(
71
+ NEW_MODEL_ID,
72
+ device_map="auto",
73
+ trust_remote_code=True,
74
+ #quantization_config=quant_config_4,
75
+ torch_dtype=torch.bfloat16
76
+ )
77
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
78
+ if tokenizer.pad_token is None:
79
+ tokenizer.pad_token = tokenizer.eos_token
80
+ tokenizer.pad_token_id = tokenizer.eos_token_id
81
+
82
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
83
+ if tokenizer.pad_token is None:
84
+ tokenizer.pad_token = tokenizer.eos_token
85
+ tokenizer.pad_token_id = tokenizer.eos_token_id
86
+
87
+ messages = []
88
+ nothink = False
89
+ same_seed = False
90
+ skip_prompt=True
91
+ skip_special_tokens=True
92
+ do_sample = True
93
+
94
+ def set_random_seed(seed=None):
95
+ """Set random seed for reproducibility. If seed is None, use int(time.time())."""
96
+ if seed is None:
97
+ seed = int(time.time()) # Convert float to int
98
+ random.seed(seed)
99
+ np.random.seed(seed)
100
+ torch.manual_seed(seed)
101
+ torch.cuda.manual_seed_all(seed) # If using CUDA
102
+ torch.backends.cudnn.deterministic = True
103
+ torch.backends.cudnn.benchmark = False
104
+ return seed # Return seed for logging if needed
105
+
106
+ class CustomTextStreamer(TextStreamer):
107
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
108
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
109
+ self.generated_text = ""
110
+ self.stop_flag = False
111
+ self.init_time = time.time() # Record initialization time
112
+ self.end_time = None # To store end time
113
+ self.first_token_time = None # To store first token generation time
114
+ self.token_count = 0 # To track total tokens
115
+
116
+ def on_finalized_text(self, text: str, stream_end: bool = False):
117
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
118
+ self.first_token_time = time.time()
119
+ self.generated_text += text
120
+ # Count tokens in the generated text
121
+ tokens = self.tokenizer.encode(text, add_special_tokens=False)
122
+ self.token_count += len(tokens)
123
+ print(text, end="", flush=True)
124
+ if stream_end:
125
+ self.end_time = time.time() # Record end time when streaming ends
126
+ if self.stop_flag:
127
+ raise StopIteration
128
+
129
+ def stop_generation(self):
130
+ self.stop_flag = True
131
+ self.end_time = time.time() # Record end time when generation is stopped
132
+
133
+ def get_metrics(self):
134
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
135
+ if self.end_time is None:
136
+ self.end_time = time.time() # Set end time if not already set
137
+ total_time = self.end_time - self.init_time # Total time from init to end
138
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
139
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
140
+ metrics = {
141
+ "init_time": self.init_time,
142
+ "first_token_time": self.first_token_time,
143
+ "first_token_latency": first_token_latency,
144
+ "end_time": self.end_time,
145
+ "total_time": total_time, # Total time in seconds
146
+ "total_tokens": self.token_count,
147
+ "tokens_per_second": tokens_per_second
148
+ }
149
+ return metrics
150
+
151
+ def generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, max_new_tokens):
152
+ input_ids = tokenizer.apply_chat_template(
153
+ messages,
154
+ tokenize=True,
155
+ enable_thinking = not nothink,
156
+ add_generation_prompt=True,
157
+ return_tensors="pt"
158
+ )
159
+ attention_mask = torch.ones_like(input_ids, dtype=torch.long)
160
+ tokens = input_ids.to(model.device)
161
+ attention_mask = attention_mask.to(model.device)
162
+
163
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
164
+
165
+ def signal_handler(sig, frame):
166
+ streamer.stop_generation()
167
+ print("\n[Generation stopped by user with Ctrl+C]")
168
+
169
+ signal.signal(signal.SIGINT, signal_handler)
170
+
171
+ generate_kwargs = {}
172
+ if do_sample:
173
+ generate_kwargs = {
174
+ "do_sample": do_sample,
175
+ "max_length": max_new_tokens,
176
+ "temperature": 0.6,
177
+ "top_k": 20,
178
+ "top_p": 0.95,
179
+ "repetition_penalty": 1.2,
180
+ "no_repeat_ngram_size": 2
181
+ }
182
+ else:
183
+ generate_kwargs = {
184
+ "do_sample": do_sample,
185
+ "max_length": max_new_tokens,
186
+ "repetition_penalty": 1.2,
187
+ "no_repeat_ngram_size": 2
188
+ }
189
+
190
+
191
+ print("Response: ", end="", flush=True)
192
+ try:
193
+ generated_ids = model.generate(
194
+ tokens,
195
+ attention_mask=attention_mask,
196
+ #use_cache=False,
197
+ pad_token_id=tokenizer.pad_token_id,
198
+ streamer=streamer,
199
+ **generate_kwargs
200
+ )
201
+ del generated_ids
202
+ except StopIteration:
203
+ print("\n[Stopped by user]")
204
+
205
+ del input_ids, attention_mask
206
+ torch.cuda.empty_cache()
207
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
208
+
209
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
210
+
211
+ init_seed = set_random_seed()
212
+
213
+ while True:
214
+ if same_seed:
215
+ set_random_seed(init_seed)
216
+ else:
217
+ init_seed = set_random_seed()
218
+
219
+ print(f"\nnothink: {nothink}")
220
+ print(f"skip_prompt: {skip_prompt}")
221
+ print(f"skip_special_tokens: {skip_special_tokens}")
222
+ print(f"do_sample: {do_sample}")
223
+ print(f"same_seed: {same_seed}, {init_seed}\n")
224
+
225
+ user_input = input("User: ").strip()
226
+ if user_input.lower() == "/exit":
227
+ print("Exiting chat.")
228
+ break
229
+ if user_input.lower() == "/clear":
230
+ messages = []
231
+ print("Chat history cleared. Starting a new conversation.")
232
+ continue
233
+ if user_input.lower() == "/nothink":
234
+ nothink = not nothink
235
+ continue
236
+ if user_input.lower() == "/skip_prompt":
237
+ skip_prompt = not skip_prompt
238
+ continue
239
+ if user_input.lower() == "/skip_special_tokens":
240
+ skip_special_tokens = not skip_special_tokens
241
+ continue
242
+ if user_input.lower().startswith("/same_seed"):
243
+ parts = user_input.split()
244
+ if len(parts) == 1: # /same_seed (no number)
245
+ same_seed = not same_seed # Toggle switch
246
+ elif len(parts) == 2: # /same_seed <number>
247
+ try:
248
+ init_seed = int(parts[1]) # Extract and convert number to int
249
+ same_seed = True
250
+ except ValueError:
251
+ print("Error: Please provide a valid integer after /same_seed")
252
+ continue
253
+ if user_input.lower() == "/do_sample":
254
+ do_sample = not do_sample
255
+ continue
256
+ if not user_input:
257
+ print("Input cannot be empty. Please enter something.")
258
+ continue
259
+
260
+
261
+ messages.append({"role": "user", "content": user_input})
262
+ activated_experts = []
263
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, 40960)
264
+ print("\n\nMetrics:")
265
+ for key, value in metrics.items():
266
+ print(f" {key}: {value}")
267
+
268
+ print("", flush=True)
269
+ if stop_flag:
270
+ continue
271
+ messages.append({"role": "assistant", "content": response})
272
+
273
+ # Remove all hooks after inference
274
+ for h in hooks: h.remove()
275
+ ```
276
+
277
+ ### Usage Warnings
278
+
279
+
280
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
281
+
282
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
283
+
284
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
285
+
286
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
287
+
288
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
289
+
290
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
291
+
292
+
293
+ ### Donation
294
+
295
+ If you like it, please click 'like' and follow us for more updates.
296
+ You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
297
+
298
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
299
+ - bitcoin(BTC):
300
+ ```
301
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
302
+ ```