Do the distilled models also have 128K context?

#4
by Troyanovsky - opened

DeepSeek-R1 has 128K context length. Do the distilled models also have this context length or smaller?

This also depends on the models used for distillation themselves, but as long as those support it ( which is the case with Llama ), it should be fine

It seems not from what I can tell, unless I'm missing something. With langchain running as a context manager, this model will fill up in about 2048 tokens. Anyone else agree? Am I missing something?

Sign up or log in to comment