---
license: apache-2.0
datasets:
- yuchenxie/arlowgpt-pro
language:
- en
library_name: transformers
---

# DUMMY WEIGHTS ONLY !! NOT REAL MODEL !!

# Specifications (Will be carried out to the main fully released model)
## Architecture: arlow (Not supported by transformers right now, but beta version is out -> [here](https://github.com/yuchenxie4645/transformers/tree/ArlowRegistration))

## Exact location: [here](https://github.com/yuchenxie4645/transformers/tree/ArlowRegistration/src/transformers/models/arlow)

## Will feature:
- GQA
- Silu
- Flash Attention VarLen + manual QKV proj
- cross attention (untrained, there for easy vision encoder incorporation)
- model is decoder only, however, cross attention weights is there.
- Custom RoPE
- and more!

Arlow architecture isn't officially supported by transformers but my implementation will be there if you want to try it.