Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
fxmeng
's Collections
TransMLA-base
CLOVER-Commonsense-148k
PiSSA-LLaMA-3-8B
PiSSA-LLaMA-3-70B
PiSSA-LLaMA-2-7B
PiSSA-LLaMA-3-8B-Instruct
PiSSA-Qwen2
PiSSA Datasets
Mixtral-1~8x7B-Instruct-v0.1
TransMLA-base
updated
Jun 15
Base Model for TransMLA
Upvote
-
TransMLA: Multi-head Latent Attention Is All You Need
Paper
•
2502.07864
•
Published
Feb 11
•
56
fxmeng/TransMLA-llama-2-7b-r64-n512-norm
Text Generation
•
6B
•
Updated
Jun 16
•
6
fxmeng/transmla_pretrain_6B_tokens
Viewer
•
Updated
18 days ago
•
5.94M
•
150
fxmeng/transmla_pretrain_1B_tokens
Viewer
•
Updated
18 days ago
•
1.14M
•
75
fxmeng/transmla_pretrain_100m_tokens
Viewer
•
Updated
18 days ago
•
100k
•
63
Upvote
-
Share collection
View history
Collection guide
Browse collections