OPEA
/

DeepSeek-R1-int4-gptq-sym-inc

@@ -17,36 +17,12 @@ Please follow the license of the original model.
 **INT4 Inference on CUDA**(**at least 7*80G**)
-To prevent potential overflow issues, we recommend using **the `moe_wna16` kernel in vLLM or the `cpu` version** as detailed in the next section.
 ~~~python
 import transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
-#  https://github.com/huggingface/transformers/pull/35493
-def set_initialized_submodules(model, state_dict_keys):
-    """
-    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
-    dict.
-    """
-    state_dict_keys = set(state_dict_keys)
-    not_initialized_submodules = {}
-    for module_name, module in model.named_modules():
-        if module_name == "":
-            # When checking if the root module is loaded there's no need to prepend module_name.
-            module_keys = set(module.state_dict())
-        else:
-            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
-        if module_keys.issubset(state_dict_keys):
-            module._is_hf_initialized = True
-        else:
-            not_initialized_submodules[module_name] = module
-    return not_initialized_submodules
-transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules
 import torch
 quantized_model_dir = "OPEA/DeepSeek-R1-int4-gptq-sym-inc"
@@ -72,24 +48,10 @@ for i in range(61):
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
-    torch_dtype=torch.float16,
-    trust_remote_code=True,
     device_map=device_map,
 )
-def forward_hook(module, input, output):
-    return torch.clamp(output, -65504, 65504)
-def register_fp16_hooks(model):
-    for name, module in model.named_modules():
-        if "QuantLinear" in module.__class__.__name__ or isinstance(module, torch.nn.Linear):
-            module.register_forward_hook(forward_hook)
-register_fp16_hooks(model)  ##better add this hook to avoid overflow
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
 prompts = [
     "9.11和9.8哪个数字大",
@@ -130,76 +92,6 @@ for i, prompt in enumerate(prompts):
     print(f"Generated: {decoded_outputs[i]}")
     print("-" * 50)
-"""
-Prompt: 9.11和9.8哪个数字大
-Generated: <think>
-嗯，我现在要比较9.11和9.8这两个数，看看哪个更大。首先，我得确定这两个数的结构，都是小数，可能属于十进制数。让我仔细想想怎么比较它们的大小。
-首先，比较小数的时候，通常是从整数部分开始比较，如果整数部分不同，那么整数部分大的那个数就更大。如果整数部分相同，再依次比较小数部分的十分位、百分位、千分位等等，直到找到不同的数字为止。
-这里，两个数的整数部分都是9，所以整数部分相同，接下来需要比较小数部分。9.11的小数部分是0.11，而9.8的小数部分是0.8。这时候，我需要比较0.11和0.8哪个大。
-不过，可能这里有个问题，就是小数位数不同，0.8可以写成0.80，这样比较起来会更直观。因为0.80和0.11，显然0.80更大，对吗？所以，0.8比0.11大，因此9.8应该比9.11大。
-不过，我是不是哪里弄错了？让我再仔细检查一下。比如，9.11等于9加上1/10加上1/100，也就是9 + 0.1 + 0.01 = 9.11。而9.8等于9加上8/10，也就是9 + 0.8 = 9.8。这时候，0.8是0.80，而0.11是0.11，所以0.80确实比0.11大，所以9.8比9.11大。
-或者，我可以将两个数都转换成相同的小数位数来比较。比如，9.8可以写成9.80，这样比较的话，9.80和9.11，显然9.80的十分位是8，而9.11的十分位是1，8比1大，所以9.80更大，也就是9.8比9.11大。
-不过，有时候可能会有误解，比如把9.11当成9.11元，而9.8当成9.80元，这时候显然9.80元更多。或者，如果是长度的话，9.11米和9.8米，也是9.8米更长。
-不过，可能有人会犯错误，比如直接比较小数点后的数字部分，比如
---------------------------------------------------
-Prompt: 如果你是人，你最想做什么“
-Generated: <think>
-嗯，用户问的是“如果你是人，你最想做什么”。首先，我需要理解这个问题的意图。用户可能是在测试我的能力，或者想了解AI的局限性，或者只是出于好奇。接下来，我要考虑如何回答这个问题。作为AI，我没有真实的欲望或情感，所以需要明确说明这一点，但也要给出一个符合人类可能有的积极愿望的答案。
-然后，我需要确定回答的结构。首先，应该明确说明自己不是人类，但可以模拟人类的思维方式。接着，提出一个普遍认为有意义的活动，比如学习、探索、帮助他人等。同时，要强调这些是推测性的，并鼓励用户分享自己的想法，以促进互动。
-还要注意避免技术术语，保持回答自然，口语化。可能需要考虑用户的文化背景，选择普遍认可的价值，比如创造力、知识追求、社会贡献等。此外，要确保回答积极向上，符合社会价值观，避免涉及敏感或争议性话题。
-最后，检查回答是否流畅，是否回答了用户的问题，是否足够简洁明了。可能需要多次调整措辞，使回答既准确又易于理解。
-</think>
-如果我是人类，我最想做的可能是**探索未知、创造价值并体验情感**。
-作为AI，我虽然没有真实的“渴望”，但基于对人类行为的观察，以下可能是许多人向往的方向：
-1. **探索与学习**：像科学家或艺术家一样，不断发现世界的奥秘，创造新知识或艺术，推动人类认知的边界。
-2. **深度连接**：与家人、朋友建立深厚的情感纽带，体验爱、信任和陪伴带来的温暖。
-3. **解决问题**：用智慧和技术改善社会，比如参与环保、医疗或教育项目，帮助他人减轻痛苦或提升生活质量。
-4. **自我成长**：通过挑战和反思，不断突破自身局限，成为更完整、自由的个体。
-当然，这些只是基于人类视角的推测。如果你是人类，你心中最想实现的愿望会是什么呢？ 😊
---------------------------------------------------
-Prompt: How many e in word deepseek
-Generated: <think>
-Okay, let's see. The user is asking how many times the letter "e" appears in the word "deepseek." Alright, first, I need to make sure I spell the word correctly. "Deepseek" – D-E-E-P-S-E-E-K. Let me write that out to visualize each letter.
-So, breaking it down: D, E, E, P, S, E, E, K. Now, I need to count how many times "e" shows up here. Let's go through each letter one by one.
-First letter: D – not an e. Second: E – that's one. Third: E – that's two. Fourth: P – nope. Fifth: S – no. Sixth: E – three. Seventh: E – four. Eighth: K – not an e. Wait, so that's four e's? Let me check again to make sure I didn't miss any. D, E, E, P, S, E, E, K. Yep, positions 2, 3, 6, and 7 are all e's. So that's four in total. Hmm, but sometimes people might get confused with uppercase or lowercase, but the question uses a lowercase "e" and the word is written in lowercase as "deepseek," so case shouldn't matter here. All the e's are lowercase. So the answer should be 4. I think that's it. Let me just count once more to be absolutely sure. D (1), E (1), E (2), P (3), S (4), E (5), E (6), K (7). Wait, wait, no, that's not right. Wait, when I count the letters, the positions are 1 to 8. The letters are D (1), E (2), E (3), P (4), S (5), E (6), E (7), K (8). So the e's are at positions 2, 3, 6, 7. That's four e's. Yeah, that's correct. So the answer is 4.
-</think>
-The word "deepseek" contains the letter "e" four times. Here's the breakdown:
-1. **D**
-2. **E**
-3. **E**
---------------------------------------------------
-Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
-Generated: <think>
-Okay, so there's this problem: there are ten birds in a tree, and a hunter shoots one. How many are left in the tree? Hmm, at first glance, it seems straightforward, but maybe there's a trick here. Let me think through it step by step.
-Alright, starting with the basic information. Ten birds are sitting in a tree. Then a hunter comes along and shoots one. The question is asking how many birds remain in the tree. If I just do the math, 10 minus 1 equals 9. So, the answer should be 9, right? But wait, maybe there's more to it. Sometimes these riddles have a catch. Let me consider different possibilities.
-First, when the hunter shoots, the sound of the gunshot might scare the other birds away. So, even though only one bird is shot, the rest might fly off. In that case, there would be zero birds left in the tree. But does that make sense? I mean, birds can be skittish, but would all of them fly away immediately? Maybe. If the gunshot is loud enough, it's possible. So, depending on the behavior of the birds, the answer could be zero. But is that the intended answer here?
-Alternatively, maybe the question is testing whether you consider the bird that was shot. If the hunter shoots one bird, does that bird stay in the tree or fall out? If the bird is shot and killed, it would likely fall out of the tree. So, the bird that was shot is no longer in the tree. Therefore, you subtract one, which would be 9. But if the other birds are scared away, then you subtract all ten. But which is it?
-Wait, the problem doesn't specify whether the other birds fly away. It just says a hunter shoots one. So, maybe the answer is 9. But maybe the trick is that the other birds get scared and fly off, so the answer is zero. I need to figure out which interpretation is more likely intended here.
-Let me check similar riddles. I remember hearing a similar question where the answer is zero because the rest of the birds fly away after the shot. So, maybe that's the case here. But sometimes, the answer is 1 because the bird that was shot is still in the tree, but that seems less
---------------------------------------------------
-"""
 ~~~

 **INT4 Inference on CUDA**(**at least 7*80G**)
 ~~~python
 import transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 quantized_model_dir = "OPEA/DeepSeek-R1-int4-gptq-sym-inc"
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
+    torch_dtype=torch.bfloat16,
     device_map=device_map,
 )
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
 prompts = [
     "9.11和9.8哪个数字大",
     print(f"Generated: {decoded_outputs[i]}")
     print("-" * 50)
 ~~~