yuchenxie
/

arlowgpt-dummy-weights

Text Generation

Model card Files Files and versions

DUMMY WEIGHTS ONLY !! NOT REAL MODEL !!

Specifications (Will be carried out to the main fully released model)

Architecture: arlow (Not supported by transformers right now, but beta version is out -> here)

Exact location: here

Will feature:

GQA
Silu
Flash Attention VarLen + manual QKV proj
cross attention (untrained, there for easy vision encoder incorporation)
model is decoder only, however, cross attention weights is there.
Custom RoPE
and more!

Arlow architecture isn't officially supported by transformers but my implementation will be there if you want to try it.

Downloads last month: 43

Safetensors

Model size

368M params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train yuchenxie/arlowgpt-dummy-weights

Collection including yuchenxie/arlowgpt-dummy-weights

ArlowGPT Foundational

2 items • Updated 11 days ago • 1