--- license: apache-2.0 datasets: - yuchenxie/arlowgpt-pro language: - en library_name: transformers --- # DUMMY WEIGHTS ONLY !! NOT REAL MODEL !! # Specifications (Will be carried out to the main fully released model) ## Architecture: arlow (Not supported by transformers right now, but beta version is out -> [here](https://github.com/yuchenxie4645/transformers/tree/ArlowRegistration)) ## Exact location: [here](https://github.com/yuchenxie4645/transformers/tree/ArlowRegistration/src/transformers/models/arlow) ## Will feature: - GQA - Silu - Flash Attention VarLen + manual QKV proj - cross attention (untrained, there for easy vision encoder incorporation) - model is decoder only, however, cross attention weights is there. - Custom RoPE - and more! Arlow architecture isn't officially supported by transformers but my implementation will be there if you want to try it.