how did you do it?
1
#4 opened 24 days ago
by
ehartford

compare to qwen3-8b and qwen3-14b
π
6
#3 opened 27 days ago
by
decem

Could the same distillation technology be used to create a draft model for DeepSeek R1 0528 ?
π
1
#2 opened 27 days ago
by
BernardH
Multilingual?
#1 opened 27 days ago
by
AaronFeng753