unsloth/DeepSeek-R1-GGUF

#46 opened 8 months ago by

AlekseyStart

MTP weights?

#45 opened 8 months ago by

SzymonOzog

Production ready Deepseek R1 GGUF deployment instructions (with cpu offloading) on AWS (10x cheaper than Bedrock imports)

😎 🔥 3

#44 opened 8 months ago by

samagra14

DeepSeek-R1-UD-Q2_K_XL model inference by llama.cpp can't use flash-attention with n_embd_head_k!=n_embd_head_v

#43 opened 8 months ago by

fuzhenxin

Share a mmlu test result,I use 2.51bit,and compare with ds api, baidu's ds,it seems 2.51bit is very smart at least in mmlu

#42 opened 8 months ago by

tarjintor

RTX 5090 with 600GB of RAM what models？

#40 opened 8 months ago by

frank-mx

Deploying a production ready service with GGUF on AWS account.

❤️ 2

1

#39 opened 8 months ago by

samagra-tensorfuse

How to Convert DeepSeek-R1-UD-IQ1_M GGUF Back to Safetensors?

#38 opened 8 months ago by

Cheryl33990

Perplexity comparsion results (Updated)

🔥 👍 8

#37 opened 8 months ago by

inputout

Q2_K_XL model is the best? IQ2_XXS is better than Q2_K_XL in mmlu-pro benchmark

11

#36 opened 8 months ago by

albertchow

Long-Form input takes too long

#35 opened 9 months ago by

htkim27

Q2_K_XL 好还是 Q4好呢

#34 opened 9 months ago by

jializou

is it uncensored?

#33 opened 9 months ago by

Morrigan-Ship

Cannot Run `unsloth/DeepSeek-R1-GGUF` Model – Missing `configuration_deepseek.py`

👀 👍 4

#32 opened 9 months ago by

syrys4750

When using llama.cpp to deploy the DeepSeek - R1 - Q4_K_M model, garbled characters appear in the server's response.

#31 opened 9 months ago by

KAMING

各种量化版本的模型，在不同测评数据集上面的表现怎么样，有没有具体的测试结果

#29 opened 9 months ago by

huanfa

when using with ollama, does it support kv_cache_type=q4_0 and flash_attention=1?

#28 opened 9 months ago by

leonzy04

如何同时处理多个http请求

#27 opened 9 months ago by

007hao

IQ1_S模型合并后部署于ollama上，推理生成效果差

#26 opened 9 months ago by

gaozj

模型似乎被微调过

#25 opened 9 months ago by

mogazheng

What is the base precision type(FP32/FP16) used in Q2/Q1 quantization?

#23 opened 9 months ago by

ArYuZzz1

any benchmark results?

👍 3

#22 opened 9 months ago by

Wei-Wu

Accuracy of the dynamic quants compared to usual quants?

19

#21 opened 9 months ago by

inputout

8bits quantization

#20 opened 9 months ago by

ramkumarkoppu

New research paper, R1 type reasoning models can be drastically improved in quality

#19 opened 9 months ago by

krustik

md5 / sha256 hashes please

1

#18 opened 9 months ago by

ivanvolosyuk

Is there a model removing non-shared MoE experts?

#17 opened 9 months ago by

ghostplant

A Step-by-step deployment guide with ollama

👍 🔥 3

#16 opened 9 months ago by

snowkylin

No think tokens visible

6

#15 opened 9 months ago by

sudkamath

Over 2 tok/sec agg backed by NVMe SSD on 96GB RAM + 24GB VRAM AM5 rig with llama.cpp

🔥 👍 4

9

#13 opened 9 months ago by

ubergarm

Running the model with vLLM does not actually work

🔥 8

8

#12 opened 9 months ago by

aikitoria

DeepSeek-R1-GGUF on LMStudio not available

#11 opened 9 months ago by

32SkyDive

Where did the BF16 come from?

8

#10 opened 9 months ago by

gshpychka

Inference speed

#9 opened 9 months ago by

Iker

Running this model using vLLM Docker

➕ 4

#8 opened 9 months ago by

moficodes

UD-IQ1_M models for distilled R1 versions?

#6 opened 9 months ago by

SamPurkis

Llama.cpp server chat template

#4 opened 9 months ago by

softwareweaver

Are the Q4 and Q5 models R1 or R1-Zero

18

#2 opened 10 months ago by

gng2info

What is the VRAM requirement to run this ?