MMTEB: Massive Multilingual Text Embedding Benchmark Paper β’ 2502.13595 β’ Published Feb 19 β’ 37
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions Paper β’ 2502.13791 β’ Published Feb 19 β’ 5
Bridging the Data Provenance Gap Across Text, Speech and Video Paper β’ 2412.17847 β’ Published Dec 19, 2024 β’ 9
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper β’ 2412.02980 β’ Published Dec 4, 2024 β’ 15
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper β’ 2407.14933 β’ Published Jul 20, 2024 β’ 12