Do the distilled models also have 128K context?

by Troyanovsky - opened Jan 21

Discussion

Troyanovsky

Jan 21

DeepSeek-R1 has 128K context length. Do the distilled models also have this context length or smaller?

dimitadi

Jan 21

This also depends on the models used for distillation themselves, but as long as those support it ( which is the case with Llama ), it should be fine

Spitfire2600

Apr 3

It seems not from what I can tell, unless I'm missing something. With langchain running as a context manager, this model will fill up in about 2048 tokens. Anyone else agree? Am I missing something?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment