LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks Paper • 2406.18403 • Published Jun 26, 2024
The LAMBADA dataset: Word prediction requiring a broad discourse context Paper • 1606.06031 • Published Jun 20, 2016
Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis Paper • 2305.11993 • Published May 19, 2023