Commit
·
72cacec
1
Parent(s):
600c271
add figure
Browse files- .gitattributes +1 -0
- README.md +2 -2
- assets/cost.png +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
assets/cost.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
license: mit
|
| 3 |
library_name: transformers
|
| 4 |
base_model:
|
| 5 |
-
- deepseek-ai/DeepSeek-V3.
|
| 6 |
---
|
| 7 |
# DeepSeek-V3.2-Exp
|
| 8 |
|
|
@@ -50,7 +50,7 @@ We are excited to announce the official release of DeepSeek-V3.2-Exp, an experim
|
|
| 50 |
This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
|
| 51 |
|
| 52 |
<div align="center">
|
| 53 |
-
<img src="cost.
|
| 54 |
</div>
|
| 55 |
|
| 56 |
- DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
|
|
|
|
| 2 |
license: mit
|
| 3 |
library_name: transformers
|
| 4 |
base_model:
|
| 5 |
+
- deepseek-ai/DeepSeek-V3.2-Exp-Base
|
| 6 |
---
|
| 7 |
# DeepSeek-V3.2-Exp
|
| 8 |
|
|
|
|
| 50 |
This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
|
| 51 |
|
| 52 |
<div align="center">
|
| 53 |
+
<img src="assets/cost.png" >
|
| 54 |
</div>
|
| 55 |
|
| 56 |
- DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
|
assets/cost.png
ADDED
|
Git LFS Details
|