OPEA/DeepSeek-R1-int4-sym-gguf-q4-0-inc

Model Details

This gguf model is an int4 model with group_size 32 and symmetric quantization of deepseek-ai/DeepSeek-R1 generated by intel/auto-round algorithm.

How To Use

Requirements

Please follow the Build llama.cpp locally to install the necessary dependencies.

INT4 Inference

>>> text="9.11和9.8哪个数字大"
>>> ./llama-cli -m DeepSeek-R1-bf16-256x20B-Q4_0-00001-of-00089.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## <think>
## 嗯，用户问的是9.11和9.8哪个更大。我需要先确认这两个数字的类型。看起来它们都是小数，但写法可能不同。比如，9.11可能代表9又11/100，而9.8可能代表9又8/10。这种情况下，我应该把它们转换成相同的分母来比较，或者直接转换成小数进行比较。
## 首先，9.8可以看作9.80，也就是9又80/100。这时候，9.11是9.11，也就是9又11/100。这样比较的话，9.80比9.11大，因为80分位比11大。所以应该是9.8更大。不过用户可能写的是不同的小数位数，我需要确认是否有误解的可能。例如，是否用户把9.11当成其他形式，比如日期或者其他格式，但通常在数学问题中，都是小数比较。所以应该直接比较数值。
## 另一个方法是将它们转换成分数：9.11等于911/100，而9.8等于98/10。如果比较这两个分数，可以统一分母。98/10等于980/100，911/100和980/100，明显980更大，所以9.8更大。这样答案是正确的。应该没问题。
## 需要确认用户是否可能混淆了小数点后的写法，比如9.11是否指的是9.11版本或其他，但题目中没有提供这样的上下文，所以默认是数值比较。因此正确答案应该是9.8比9.11大。我需要用简单的解释告诉用户，通过转换小数点后的位数，比较大小。
## </think>
## 9.11和9.8相比，9.8更大。
## **分析：**
## - **统一小数位数：** 将9.8改写为9.80，以便与9.11比较。
## - **比较步骤：** 
##   - 整数部分相同（均为9），比较小数部分：
##     - 9.11的小数部分是0.11（即11/100）
##     - 9.80的小数部分是0.80（即80/100）
##   - 由于80/100 > 11/100，因此9.80 > 9.11。
## **结论：** 尽管9.11的小数位数更多，但在数值上9.8（即9.80）更大。 [end of text]

>>> text="strawberry中有几个r?"
>>> ./llama-cli -m DeepSeek-R1-bf16-256x20B-Q4_0-00001-of-00089.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## <think>
## 嗯，用户问的是“strawberry”中有几个r。我需要仔细检查这个词的拼写。首先，我得先把单词拆分来看。strawberry这个词的正确拼写是S-T-R-A-W-B-E-R-R-Y。让我一个一个字母来看：s, t, r, a, w, b, e, r, r, y。中间的两个r是在后面的部分。可能有人会混淆是否中间有两个r。例如，有的人可能把中间部分写成“berry”，里面有一个r，但实际上在“strawberry”里，正确的写法是有两个r。所以，应该是在第3个字母有一个r，然后在后面e后面有两个r。总共三个r吗？让我再检查一遍。拆开的话：s-t-r-a-w-b-e-r-r-y。是的，在e后面有两个连续的r，所以总共有三个r。或者有没有可能，我有没有数错？例如，前面的r是第三个字母，然后后面有两次r？所以第三个字母是r，然后后面是第七和第八字母也是r。那确实是三个r。所以答案是3？或者是不是？可能用户可能觉得中间有两个r，所以总共有两个？或者可能拼写错误。我需要确保单词的正确拼写。例如，正确的拼写是strawberry，而不是strawberrie或者其他变体。是的，正确的拼写是s-t-r-a-w-b-e-r-r-y。所以有三个r。因此，答案应该是3个。
## </think>
## 
## “strawberry”中有3个字母r。正确的拼写为：s-t-r-a-w-b-e-r-r-y。第三个字母是第一个r，第七和第八字母是两个连续的r，因此总共有3个r。 [end of text]

>>> text="There is a girl who likes adventure,"
>>> ./llama-cli -m DeepSeek-R1-bf16-256x20B-Q4_0-00001-of-00089.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## <think>
## Okay, so the user mentions a girl who likes adventure. I need to come this into a story. Let me brainstorm possible directions. Maybe she goes on a journey, faces challenges, meets characters. Should I make it a fantasy or realistic adventure? Maybe a quest? Perhaps she wants to find something or achieve a goal. Need details like setting, conflict, resolution. Think about plot structure. Maybe she starts in a mundane world, gets a call to adventure, faces trials, achieves a goal. Add some obstacles and character development. Maybe include friends or mentors. Use vivid descriptions. Need to make it engaging. Avoid clichés. Check if there's a specific request in the user's question. They just want a story based on the prompt. Ensure it's creative and flows well. Okay, start outlining the plot points.
## </think>
## Certainly! Here's a creative story based on the prompt:
## ---
## **Title: "The Map of Whispers"**
## 
## **Setting:**  
## Lila lived in the sleepy village of Pine Hollow, where the most adventurous thing anyone did was gossip about the new bakery’s scone flavors. But Lila craved *real* adventure—storms that tore sails, maps scribbled in riddles, and shadows of undiscovered islands. She spent evenings poring over her grandfather’s journal, which spoke of a "Star-Swallowed Temple" hidden in the mountains. Everyone dismissed it as legend. But when a comet streaked across the sky one night, Lina packed her bag.
## 
## **Inciting Incident:**  
## A week later, Lila found herself in the moss-crusted ruins of an ancient monastery. Her grandfather’s journal had clues: *"Seek the moon where the fox sleeps."* She deciphered it as a pointer to a lunar-shaped cave. Inside, a mosaic depicted constellations, but one star was missing. She realized—the comet she’d seen! Using a diamond-shaped key from her pack, she slotted it into a crevice in the wall. The ground trembled.
## 
## **Conflict & Trials:**  
## The cave collapsed! Lila fled through tunnels, chased by falling rocks. She emerged in a valley where the trees were silver and the birds spoke in human voices. A figure appeared—a "warden" of the temple, who tested her with three trials:  
## 1. **The River of Faces:** Cross a waterway where reflections showed her deepest fears.  
## 2. **The Bridge of Elsewhere:** A plank bridge that only

>>> text="Please give a brief introduction of DeepSeek company."
>>> ./llama-cli -m DeepSeek-R1-bf16-256x20B-Q4_0-00001-of-00089.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## <think>
## Okay, so the user wants a brief introduction of DeepSeek. First, I need to figure out what DeepSeek is. I remember that DeepSeek is a Chinese company involved in AI research and development. They might focus on large language models or other AI technologies. Let me check my knowledge base. Yes, DeepSeek is known for developing advanced AI models like DeepSeek-R1. They probably offer solutions for businesses and researchers. Should I mention their areas like NLP, machine learning, etc.? Also, maybe their commitment open-source or collaborations with academia. I should ensure the introduction is concise but covers key aspects like their focus, technologies, products, and maybe their mission or collaborations. Need to avoid technical jargon to keep it accessible. Let me structure this: company name, location, focus areas, key products/services, partnerships, and their vision or mission. Double-check to ensure accuracy and avoid any misinformation.
## </think>
## 
## DeepSeek is a China-based artificial intelligence company specializing in the research and development of advanced AI technologies, particularly in the realm of large language models (LLMs) and natural language processing (NLP). Known for its open-source model DeepSeek-R1, the company focuses on creating scalable AI solutions that enhance efficiency and decision-making across industries. DeepSeek collaborates with academic institutions and enterprises, aiming to democratize access to cutting-edge AI tools while maintaining a commitment ethical AI development. Their products cater to both commercial applications and research communities, emphasizing versatility, adaptability, and real-world problem-solving. [end of text]

Generate the model

5*80G gpu is needed(could optimize), 1.4T cpu memory is needed

1 add meta data to bf16 model https://huggingface.co/opensourcerelease/DeepSeek-R1-bf16

import safetensors
from safetensors.torch import save_file
 
for i in range(1, 164):
    idx_str = "0" * (5-len(str(i))) + str(i)
    safetensors_path = f"model-{idx_str}-of-000163.safetensors"
    print(safetensors_path)
    tensors = dict()
    with safetensors.safe_open(safetensors_path, framework="pt") as f:
        for key in f.keys():
            tensors[key] = f.get_tensor(key)
    save_file(tensors, safetensors_path, metadata={'format': 'pt'})

2 remove torch.no_grad in modeling_deepseek.py as we need some tuning in AutoRound.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "opensourcerelease/DeepSeek-R1-bf16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto")

block = model.model.layers
device_map = {}
for n, m in block.named_modules():
  if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
    if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2])<63 :
      device ="cuda:1"
    elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and  int(n.split('.')[-2]) < 128:
      device = "cuda:2"
    elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
          n.split('.')[-2]) < 192:
      device = "cuda:3"
    elif "experts" in n and ("shared_experts" not in n)  and int(
          n.split('.')[-2]) >= 192:
      device = "cuda:4"
    else:
      device = "cuda:0"
    n = n[2:]
    device_map.update({n: device})

from auto_round import AutoRound

autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map,
                       iters=200, batch_size=8, seqlen=512, enable_torch_compile=False)
autoround.quantize()
autoround.save_quantized(format="gguf:q4_0", output_dir="tmp_autoround")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

DeepSeek-R1-int4-sym-gguf-q4-0-inc

Model Details

How To Use

Requirements

INT4 Inference

Generate the model

Ethical Considerations and Limitations

Caveats and Recommendations

Disclaimer

Cite

Model tree for OPEA/DeepSeek-R1-int4-sym-gguf-q4-0-inc

Dataset used to train OPEA/DeepSeek-R1-int4-sym-gguf-q4-0-inc

Collection including OPEA/DeepSeek-R1-int4-sym-gguf-q4-0-inc

DeepSeek