Fizz šŸ³ļøā€āš§ļø PRO

Fizzarolli

AI & ML interests

None yet

Recent Activity

liked a Space 1 day ago
CATIE-AQ/FAT5-report
upvoted a collection 2 days ago
NanoBEIR šŸŗ
View all activity

Organizations

Zeus Labs's profile picture Social Post Explorers's profile picture dreamgen-preview's profile picture ShuttleAI's profile picture testing's profile picture Alfitaria's profile picture Allura's profile picture Estrogen's profile picture Smol Community's profile picture

Fizzarolli's activity

replied to tomaarsen's post 2 days ago
view reply

in a somewhat similar vein (not rly though), has anyone over there experimented with taking a current encoder arch (ie modernbert), ripping out the transformers, replacing them with something like mamba2/griffin temporal mixing layers, then distilling the original model onto it? seems like it could be a lot less lossy than a straight static embedding layer but still better complexity-wise than self-attention

i was trying this earlier but the first shape error in the forward pass made me give up šŸ˜­

upvoted an article 2 days ago
view article
Article

Train 400x faster Static Embedding Models with Sentence Transformers

ā€¢ 98