DUMMY WEIGHTS ONLY !! NOT REAL MODEL !!
Specifications (Will be carried out to the main fully released model)
Architecture: arlow (Not supported by transformers right now, but beta version is out -> here)
Exact location: here
Will feature:
- GQA
- Silu
- Flash Attention VarLen + manual QKV proj
- cross attention (untrained, there for easy vision encoder incorporation)
- model is decoder only, however, cross attention weights is there.
- Custom RoPE
- and more!
Arlow architecture isn't officially supported by transformers but my implementation will be there if you want to try it.