TildeAI/TildeOpen-30b · Context Size?

4 days ago

I couldn't the context size of this model mentioned in the readme, just wondering if it's just 64k or if it's actually 128k or 256k?

TildeSIA

Tilde org 4 days ago

You can find it under Model Hyper-parameters: Sequence Length

TildeSIA changed discussion status to closed 4 days ago

smcleod

4 days ago

Is it really just 8k?

I saw it said 8k but assumed that cannot be correct and must be the maximum it can coherently generate in a single response.

TBergmanis

3 days ago

Maybe explaining a few things about building these models will help you understand why it is an 8k model in our first release.

Attention is quadratic with respect to input length. This means that at some point, it becomes very inefficient to train models with large sequence lengths from the beginning.
Most naturally occurring Wen data is below the 8k tokens of our tokeniser.
This means that 1) it is more efficient to do a context extension step as a continual training step for a model, and 2) we need to generate a lot of synthetic data for the context extension phase.

We figured that while we were doing these two steps, we could publish and evaluate the model we already had. We will have a model with a max-sequence length of 32- 64k, depending on how well things go.

I see you are quite active in quantising already small models. Are you interested in any specific tasks?