how did you do it?
1
#4 opened about 2 months ago
by
ehartford

compare to qwen3-8b and qwen3-14b
π
7
#3 opened about 2 months ago
by
decem

Could the same distillation technology be used to create a draft model for DeepSeek R1 0528 ?
π
2
#2 opened about 2 months ago
by
BernardH
Multilingual?
#1 opened about 2 months ago
by
AaronFeng753