Could the same distillation technology be used to create a draft model for DeepSeek R1 0528 ?
#2
by
BernardH
- opened
It is amazing that you are able to create such faithful distillations of much larger models.
I was wondering if the same distillation technology could be used to create a draft model for DeepSeek R1 0528, which is often pretty slow for having to run largely in RAM for most local deployments because of its size.
What do you think ?