Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Jackmin108
/
Moonlight-16B-A3B-Instruct-Fast
like
1
Text Generation
Transformers
Safetensors
deepseek_v3
conversational
custom_code
text-generation-inference
arxiv:
2502.16982
License:
mit
Model card
Files
Files and versions
xet
Community
Train
Deploy
Use this model
main
Moonlight-16B-A3B-Instruct-Fast
/
figures
1.1 MB
1 contributor
History:
1 commit
Jackmin108
original files
1805272
about 1 month ago
banner.png
48.8 kB
xet
original files
about 1 month ago
banner_short.png
26.9 kB
xet
original files
about 1 month ago
chinlaw_8k_flops_ratio.png
145 kB
xet
original files
about 1 month ago
fig_MMLU_performance.png
225 kB
xet
original files
about 1 month ago
fig_weight_decay.png
416 kB
xet
original files
about 1 month ago
logo.png
Safe
13.1 kB
xet
original files
about 1 month ago
megatron.png
1.99 kB
xet
original files
about 1 month ago
scaling.png
224 kB
xet
original files
about 1 month ago