deepseek-ai
/

DeepSeek-V2

Text Generation

text-generation-inference

Model card Files Files and versions

Resources

View closed (1)

typo spot: gready->greedy

#10 opened 7 months ago by

Exact computations for multi-head latent attention

#9 opened 8 months ago by

This is by far the best model I have seen until now.

#8 opened about 1 year ago by

How many tokens per second when using Deepseek-V2(236B) as inference model in 8*A100

#7 opened over 1 year ago by

Can DeepSeek-V2 run on two nodes (each with 4 A100)?

#5 opened over 1 year ago by

Calculation of _mscale during YARN RoPE scaling

#4 opened over 1 year ago by

keyError: 'sdpa'

#3 opened over 1 year ago by

Smaller Models

#2 opened over 1 year ago by

KV Cache for compress_kv or key-value states

#1 opened over 1 year ago by