Context Size?
I couldn't the context size of this model mentioned in the readme, just wondering if it's just 64k or if it's actually 128k or 256k?
You can find it under Model Hyper-parameters: Sequence Length
Is it really just 8k?
I saw it said 8k but assumed that cannot be correct and must be the maximum it can coherently generate in a single response.
Maybe explaining a few things about building these models will help you understand why it is an 8k model in our first release.
- Attention is quadratic with respect to input length. This means that at some point, it becomes very inefficient to train models with large sequence lengths from the beginning.
- Most naturally occurring Wen data is below the 8k tokens of our tokeniser.
This means that 1) it is more efficient to do a context extension step as a continual training step for a model, and 2) we need to generate a lot of synthetic data for the context extension phase.
We figured that while we were doing these two steps, we could publish and evaluate the model we already had. We will have a model with a max-sequence length of 32- 64k, depending on how well things go.
I see you are quite active in quantising already small models. Are you interested in any specific tasks?