Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.16107

Papers I find interesting

Scaling Instruction-Finetuned Language Models

Paper • 2210.11416 • Published Oct 20, 2022 • 7
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 60
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5 • 16
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Paper • 2403.03003 • Published Mar 5 • 9
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Paper • 2403.01487 • Published Mar 3 • 14
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44

FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25 • 36
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 62

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23 • 34
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper • 2402.16822 • Published Feb 26 • 15
FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25 • 36
Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26 • 28

FuseChat: Knowledge Fusion of Chat Models

FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25 • 36
FuseAI/FuseChat-7B-VaRM

Text Generation • Updated Mar 16 • 734 • 81
FuseAI/FuseChat-7B-Slerp

Text Generation • Updated Mar 16 • 31 • 5
FuseAI/FuseChat-7B-TA

Text Generation • Updated Mar 16 • 12 • 5

FuseChat: Knowledge Fusion of Chat Models

FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25 • 36
FuseAI/FuseChat-7B-VaRM

Text Generation • Updated Mar 16 • 734 • 81
FuseAI/FuseChat-7B-Slerp

Text Generation • Updated Mar 16 • 31 • 5
FuseAI/FuseChat-7B-TA

Text Generation • Updated Mar 16 • 12 • 5

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Linear Transformers are Versatile In-Context Learners

Paper • 2402.14180 • Published Feb 21 • 6

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 144
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17 • 28
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 21
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10 • 65

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 87
NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 56
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7 • 27

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138
SparQ Attention: Bandwidth-Efficient LLM Inference

Paper • 2312.04985 • Published Dec 8, 2023 • 38
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31 • 59
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs