Configuration Parsing Warning: In config.json: "architectures" must be an array

Model Card

This model is pretrained Based model. Based is strong at recalling information provided in context, despite using a fixed amount of memory during inference.

As a quality reference, we include a pretrained Attention (Llama architecture) model provided here: https://huggingface.co/hazyresearch/attn-1b, and Mamba model provided here: https://huggingface.co/hazyresearch/mamba-1b

All three checkpoints are pretrained on 10Bn tokens of the Pile in the exact same data order using next token prediction.

Model Sources

The model implementation and training code that produced the model are provided here: https://github.com/HazyResearch/based

Uses

The purpose of this work is to evaluate the language modeling quality of a new efficient architecture, Based.

We include a series of benchmarks that you can use to evaluate quality:

Citation

Please consider citing this paper if you use our work:

@article{arora2024simple,
  title={Simple linear attention language models balance the recall-throughput tradeoff},
  author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
  journal={arXiv:2402.18668},
  year={2024}
}

Please reach out to [email protected], [email protected], and [email protected] with questions.

Paper

Downloads last month: 7

Dataset used to train hazyresearch/based-1b

Collection including hazyresearch/based-1b

based

Collection

These language model checkpoints are trained at the 360M and 1.3Bn parameter scales for up to 50Bn tokens on the Pile corpus, for research purposes. • 15 items • Updated Oct 18, 2024 • 9