zhyncs commited on
Commit
0c462cd
·
verified ·
1 Parent(s): 97efda3

docs: Update README with HuggingFace and SGLang instructions

Browse files
Files changed (1) hide show
  1. README.md +147 -125
README.md CHANGED
@@ -1,126 +1,148 @@
1
- ---
2
- license: mit
3
- library_name: transformers
4
- base_model:
5
- - deepseek-ai/DeepSeek-V3.2-Exp-Base
6
- ---
7
- # DeepSeek-V3.2-Exp
8
-
9
- <!-- markdownlint-disable first-line-h1 -->
10
- <!-- markdownlint-disable html -->
11
- <!-- markdownlint-disable no-duplicate-header -->
12
-
13
- <div align="center">
14
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
15
- </div>
16
- <hr>
17
- <div align="center" style="line-height: 1;">
18
- <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
19
- <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
20
- </a>
21
- <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
22
- <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
23
- </a>
24
- <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
25
- <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
26
- </a>
27
- </div>
28
- <div align="center" style="line-height: 1;">
29
- <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
30
- <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
31
- </a>
32
- <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
33
- <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
34
- </a>
35
- <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
36
- <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
37
- </a>
38
- </div>
39
- <div align="center" style="line-height: 1;">
40
- <a href="LICENSE" style="margin: 2px;">
41
- <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
42
- </a>
43
- </div>
44
-
45
- ## Introduction
46
-
47
-
48
- We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
49
-
50
- This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
51
-
52
- <div align="center">
53
- <img src="assets/cost.png" >
54
- </div>
55
-
56
- - DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
57
-
58
-
59
- - To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
60
-
61
-
62
- | Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
63
- | :--- | :---: | :---: |
64
- | **Reasoning Mode w/o Tool Use** | | |
65
- | MMLU-Pro | 85.0 | 85.0 |
66
- | GPQA-Diamond | 80.7 | 79.9 |
67
- | Humanity's Last Exam | 21.7 | 19.8 |
68
- | LiveCodeBench | 74.9 | 74.1 |
69
- | AIME 2025 | 88.4 | 89.3 |
70
- | HMMT 2025 | 86.1 | 83.6 |
71
- | Codeforces | 2046 | 2121 |
72
- | Aider-Polyglot | 76.1 | 74.5 |
73
- | **Agentic Tool Use** | | |
74
- | BrowseComp | 38.5 | 40.1 |
75
- | BrowseComp-zh | 45.0 | 47.9 |
76
- | SimpleQA | 96.8 | 97.1 |
77
- | SWE Verified | 68.4 | 67.8 |
78
- | SWE-bench Multilingual | 57.8 | 57.9 |
79
- | Terminal-bench | 36.7 | 37.7 |
80
-
81
-
82
-
83
- ## How to Run Locally
84
-
85
- We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
86
-
87
- First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
88
- ```bash
89
- cd inference
90
- export EXPERTS=256
91
- python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
92
- ```
93
-
94
- Launch the interactive chat interface and start exploring DeepSeek's capabilities:
95
- ```bash
96
- export CONFIG=config_671B_v3.2.json
97
- torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
98
- ```
99
-
100
-
101
-
102
- ## Open-Source Kernels
103
-
104
- For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek-v32).
105
-
106
- For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
107
-
108
-
109
-
110
- ## License
111
-
112
- This repository and the model weights are licensed under the [MIT License](LICENSE).
113
-
114
- ## Citation
115
-
116
- ```
117
- @misc{deepseekai2024deepseekv32,
118
- title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
119
- author={DeepSeek-AI},
120
- year={2025},
121
- }
122
- ```
123
-
124
- ## Contact
125
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ base_model:
5
+ - deepseek-ai/DeepSeek-V3.2-Exp-Base
6
+ ---
7
+ # DeepSeek-V3.2-Exp
8
+
9
+ <!-- markdownlint-disable first-line-h1 -->
10
+ <!-- markdownlint-disable html -->
11
+ <!-- markdownlint-disable no-duplicate-header -->
12
+
13
+ <div align="center">
14
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
15
+ </div>
16
+ <hr>
17
+ <div align="center" style="line-height: 1;">
18
+ <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
19
+ <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
20
+ </a>
21
+ <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
22
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
23
+ </a>
24
+ <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
25
+ <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
26
+ </a>
27
+ </div>
28
+ <div align="center" style="line-height: 1;">
29
+ <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
30
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
31
+ </a>
32
+ <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
33
+ <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
34
+ </a>
35
+ <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
36
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
37
+ </a>
38
+ </div>
39
+ <div align="center" style="line-height: 1;">
40
+ <a href="LICENSE" style="margin: 2px;">
41
+ <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
42
+ </a>
43
+ </div>
44
+
45
+ ## Introduction
46
+
47
+
48
+ We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
49
+
50
+ This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
51
+
52
+ <div align="center">
53
+ <img src="assets/cost.png" >
54
+ </div>
55
+
56
+ - DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
57
+
58
+
59
+ - To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
60
+
61
+
62
+ | Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
63
+ | :--- | :---: | :---: |
64
+ | **Reasoning Mode w/o Tool Use** | | |
65
+ | MMLU-Pro | 85.0 | 85.0 |
66
+ | GPQA-Diamond | 80.7 | 79.9 |
67
+ | Humanity's Last Exam | 21.7 | 19.8 |
68
+ | LiveCodeBench | 74.9 | 74.1 |
69
+ | AIME 2025 | 88.4 | 89.3 |
70
+ | HMMT 2025 | 86.1 | 83.6 |
71
+ | Codeforces | 2046 | 2121 |
72
+ | Aider-Polyglot | 76.1 | 74.5 |
73
+ | **Agentic Tool Use** | | |
74
+ | BrowseComp | 38.5 | 40.1 |
75
+ | BrowseComp-zh | 45.0 | 47.9 |
76
+ | SimpleQA | 96.8 | 97.1 |
77
+ | SWE Verified | 68.4 | 67.8 |
78
+ | SWE-bench Multilingual | 57.8 | 57.9 |
79
+ | Terminal-bench | 36.7 | 37.7 |
80
+
81
+
82
+
83
+ ## How to Run Locally
84
+
85
+ ### HuggingFace
86
+
87
+ We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
88
+
89
+ First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
90
+ ```bash
91
+ cd inference
92
+ export EXPERTS=256
93
+ python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
94
+ ```
95
+
96
+ Launch the interactive chat interface and start exploring DeepSeek's capabilities:
97
+ ```bash
98
+ export CONFIG=config_671B_v3.2.json
99
+ torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
100
+ ```
101
+
102
+ ### SGLang
103
+
104
+ #### Installation with Docker
105
+
106
+ ```
107
+ # H200
108
+ docker pull lmsysorg/sglang:dsv32
109
+
110
+ # MI350
111
+ docker pull lmsysorg/sglang:dsv32-rocm
112
+
113
+ # NPUs
114
+ docker pull lmsysorg/sglang:dsv32-a2
115
+ docker pull lmsysorg/sglang:dsv32-a3
116
+ ```
117
+
118
+ #### Launch Command
119
+ ```bash
120
+ python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --page-size 64
121
+ ```
122
+
123
+
124
+ ## Open-Source Kernels
125
+
126
+ For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek-v32).
127
+
128
+ For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
129
+
130
+
131
+
132
+ ## License
133
+
134
+ This repository and the model weights are licensed under the [MIT License](LICENSE).
135
+
136
+ ## Citation
137
+
138
+ ```
139
+ @misc{deepseekai2024deepseekv32,
140
+ title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
141
+ author={DeepSeek-AI},
142
+ year={2025},
143
+ }
144
+ ```
145
+
146
+ ## Contact
147
+
148
  If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).