DUMMY WEIGHTS ONLY !! NOT REAL MODEL !!

Specifications (Will be carried out to the main fully released model)

Architecture: arlow (Not supported by transformers right now, but beta version is out -> here)

Exact location: here

Will feature:

  • GQA
  • Silu
  • Flash Attention VarLen + manual QKV proj
  • cross attention (untrained, there for easy vision encoder incorporation)
  • model is decoder only, however, cross attention weights is there.
  • Custom RoPE
  • and more!

Arlow architecture isn't officially supported by transformers but my implementation will be there if you want to try it.

Downloads last month
43
Safetensors
Model size
368M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train yuchenxie/arlowgpt-dummy-weights

Collection including yuchenxie/arlowgpt-dummy-weights