Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Apr 8, 2021
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published 13 days ago • 44
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 37
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 101
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 24 days ago • 467
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 134
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 349
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 21
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27 • 13
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17 • 36
Llama 3.1 GPTQ, AWQ, and BNB Quants Collection Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM 🤗 • 9 items • Updated Sep 26 • 54
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 62
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 31