-
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Paper • 2407.11784 • Published • 4 -
Data Management For Large Language Models: A Survey
Paper • 2312.01700 • Published -
Datasets for Large Language Models: A Comprehensive Survey
Paper • 2402.18041 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2402.05123
-
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 49 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 54 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26
-
Effective pruning of web-scale datasets based on complexity of concept clusters
Paper • 2401.04578 • Published -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 41 -
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
LESS: Selecting Influential Data for Targeted Instruction Tuning
Paper • 2402.04333 • Published • 3
-
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 12 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 28 -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 17 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 28