Papers
arxiv:2508.02193

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Published on Aug 4
ยท Submitted by yxsong on Aug 6
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Seed Diffusion Preview, a discrete-state diffusion language model, achieves fast inference speeds through parallel generation, outperforming Mercury and Gemini Diffusion in speed and quality.

AI-generated summary

We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

Community

Paper author Paper submitter

We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

That's so coooooool

On the main graph what would the token per second of seed coder instruct be on the same hardware?

It's confusing there is isn't a clear and direct throughput comparison between this model and a autoregressive one

ยท
Paper author

Hi, Thanks for your interest. We just do the evaluation of seed-coder-instruct over our deployment settings, the speed is 344 token/s. And good suggestions! will consider updating the main fig : )

nice work

nice work

This looks amazing! Any plans to open source this ? ๐Ÿ‘€

Pretty cool! I recently added support for LLaDA and Dream models in llama.cpp, would love to add support if you ever plan to open source the inference code!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Pretty cool! I recently added support for LLaDA and Dream models in llama.cpp, would love to add support if you ever plan to open source the inference code!

Wow, excellent work! May I ask for the link for LLaDA and Dream in llama.cpp?

ยท

Very cool getting faster results than gemini! Would love to play around with this if you open source it!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.02193 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.02193 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.02193 in a Space README.md to link it from this page.

Collections including this paper 12