SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner Paper • 2506.09003 • Published 30 days ago • 19
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property Paper • 2504.15524 • Published Apr 22 • 4
LLMs for Patent Collection Researches (Topic: LLMs4Patent) collection of Qiyao Wang. • 8 items • Updated 7 days ago • 1
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property Paper • 2504.15524 • Published Apr 22 • 4
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property Paper • 2504.15524 • Published Apr 22 • 4 • 2
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published Apr 21 • 22
LLMs for Patent Collection Researches (Topic: LLMs4Patent) collection of Qiyao Wang. • 8 items • Updated 7 days ago • 1
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 105
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 105