Base Model for TransMLA
mengfanxu
fxmeng
AI & ML interests
None yet
Recent Activity
liked
a dataset
13 days ago
nvidia/Llama-Nemotron-VLM-Dataset-v1
authored
a paper
13 days ago
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated
Prefill \& Decode Inference
commented on
a paper
14 days ago
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated
Prefill \& Decode Inference
Organizations
None yet
PiSSA-LLaMA-3-8B
Principal Singular Values and Singular Vectors Adaptation
PiSSA-LLaMA-2-7B
Principal Singular Values and Singular Vectors Adaptation
PiSSA-Qwen2
Mixtral-1~8x7B-Instruct-v0.1
Substructure of mistralai/Mixtral-8x7B-Instruct-v0.1
TransMLA-base
Base Model for TransMLA
CLOVER-Commonsense-148k
PiSSA-LLaMA-3-8B
Principal Singular Values and Singular Vectors Adaptation
PiSSA-LLaMA-3-70B
Principal Singular Values and Singular Vectors Adaptation
PiSSA-LLaMA-2-7B
Principal Singular Values and Singular Vectors Adaptation
PiSSA-LLaMA-3-8B-Instruct
Principal Singular Values and Singular Vectors Adaptation
PiSSA-Qwen2
PiSSA Datasets
https://arxiv.org/abs/2404.02948
Mixtral-1~8x7B-Instruct-v0.1
Substructure of mistralai/Mixtral-8x7B-Instruct-v0.1