Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

deepseek-ai
/
DeepSeek-V2

Text Generation
Transformers
Safetensors
deepseek_v2
conversational
custom_code
text-generation-inference
Model card Files Files and versions Community
10
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

typo spot: gready->greedy

#10 opened 2 months ago by
Jeol

Exact computations for multi-head latent attention

1
#9 opened 3 months ago by
mseeger

This is by far the best model I have seen until now.

1
2
#8 opened 10 months ago by
ZeroWw

How many tokens per second when using Deepseek-V2(236B) as inference model in 8*A100

1
#7 opened 12 months ago by
harvin-cn

Can DeepSeek-V2 run on two nodes (each with 4 A100)?

1
1
#5 opened 12 months ago by
jy395

Calculation of _mscale during YARN RoPE scaling

1
#4 opened 12 months ago by
sszymczyk

keyError: 'sdpa'

1
#3 opened about 1 year ago by
minglingfeng

Smaller Models

10
1
#2 opened about 1 year ago by
puffy310

KV Cache for compress_kv or key-value states

6
#1 opened about 1 year ago by
House-99
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs