Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 53
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF Paper • 2310.05344 • Published Oct 9, 2023 • 1
Aligning Large Language Models through Synthetic Feedback Paper • 2305.13735 • Published May 23, 2023 • 1
JudgeLM: Fine-tuned Large Language Models are Scalable Judges Paper • 2310.17631 • Published Oct 26, 2023 • 34
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Paper • 2201.11903 • Published Jan 28, 2022 • 11
Self-Consistency Improves Chain of Thought Reasoning in Language Models Paper • 2203.11171 • Published Mar 21, 2022 • 4
Fine-tuning Language Models with Generative Adversarial Feedback Paper • 2305.06176 • Published May 9, 2023 • 1
UltraFeedback: Boosting Language Models with High-quality Feedback Paper • 2310.01377 • Published Oct 2, 2023 • 5
Verbosity Bias in Preference Labeling by Large Language Models Paper • 2310.10076 • Published Oct 16, 2023 • 2
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper • 2309.00267 • Published Sep 1, 2023 • 48
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment Paper • 2308.05374 • Published Aug 10, 2023 • 28
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment Paper • 2308.09662 • Published Aug 18, 2023 • 3
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM Paper • 2311.09528 • Published Nov 16, 2023 • 2
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 107
Reasons to Reject? Aligning Language Models with Judgments Paper • 2312.14591 • Published Dec 22, 2023 • 19
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification Paper • 2308.07921 • Published Aug 15, 2023 • 23
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Paper • 2309.17452 • Published Sep 29, 2023 • 3
LLM Guided Inductive Inference for Solving Compositional Problems Paper • 2309.11688 • Published Sep 20, 2023 • 1
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications Paper • 2402.07927 • Published Feb 5, 2024 • 1
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey Paper • 2401.07872 • Published Jan 15, 2024 • 2
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs Paper • 2309.03118 • Published Sep 6, 2023 • 2
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs Paper • 2309.09582 • Published Sep 18, 2023 • 4
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 19
Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model Paper • 2310.08072 • Published Oct 12, 2023 • 1
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering Paper • 2309.06358 • Published Sep 3, 2023 • 1
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs Paper • 2304.14999 • Published Apr 28, 2023 • 2
Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification Paper • 2308.07282 • Published Aug 14, 2023 • 1
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 48
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data Paper • 2306.13840 • Published Jun 24, 2023 • 11
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning Paper • 2212.07919 • Published Dec 15, 2022
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 244
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge Paper • 1803.05457 • Published Mar 14, 2018 • 2
WinoGrande: An Adversarial Winograd Schema Challenge at Scale Paper • 1907.10641 • Published Jul 24, 2019 • 1
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Paper • 2206.04615 • Published Jun 9, 2022 • 5
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Paper • 2303.12712 • Published Mar 22, 2023 • 2
Deduplicating Training Data Makes Language Models Better Paper • 2107.06499 • Published Jul 14, 2021 • 4
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 610
Oasis: Data Curation and Assessment System for Pretraining of Large Language Models Paper • 2311.12537 • Published Nov 21, 2023 • 1
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models Paper • 2402.10524 • Published Feb 16, 2024 • 23
Reflexion: Language Agents with Verbal Reinforcement Learning Paper • 2303.11366 • Published Mar 20, 2023 • 5
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29, 2024 • 51
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29, 2024 • 69
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models Paper • 2502.04404 • Published Feb 6 • 23