BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14 • 59
usdm Collection NeurIPS 2024 Main Track (Github: https://github.com/naver-ai/usdm) • 4 items • Updated Nov 29, 2024 • 4
Unified Speech-Text Pretraining for Spoken Dialog Modeling Paper • 2402.05706 • Published Feb 8, 2024 • 7
Variational Hierarchical Dialog Autoencoder for Dialog State Tracking Data Augmentation Paper • 2001.08604 • Published Jan 23, 2020 • 1
Aligning Large Language Models through Synthetic Feedback Paper • 2305.13735 • Published May 23, 2023 • 1
Self-Guided Contrastive Learning for BERT Sentence Representations Paper • 2106.07345 • Published Jun 3, 2021
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers Paper • 2109.04650 • Published Sep 10, 2021
Unified Speech-Text Pretraining for Spoken Dialog Modeling Paper • 2402.05706 • Published Feb 8, 2024 • 7
Aligning Large Language Models by On-Policy Self-Judgment Paper • 2402.11253 • Published Feb 17, 2024 • 2
KMMLU: Measuring Massive Multitask Language Understanding in Korean Paper • 2402.11548 • Published Feb 18, 2024
Aligning Language Models to Explicitly Handle Ambiguity Paper • 2404.11972 • Published Apr 18, 2024 • 1
Critic-Guided Decoding for Controlled Text Generation Paper • 2212.10938 • Published Dec 21, 2022 • 2