Harnessing Large Language Models for Scientific Novelty Detection Paper • 2505.24615 • Published 16 days ago • 5
Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory Paper • 2305.17144 • Published May 25, 2023 • 2
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision Paper • 2211.10439 • Published Nov 18, 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer Paper • 2207.09840 • Published Jul 20, 2022
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning Paper • 2406.07543 • Published Jun 11, 2024
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published 16 days ago • 45
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Paper • 2412.09613 • Published Dec 12, 2024 • 1
MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search Paper • 2505.19209 • Published 21 days ago • 24
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback Paper • 2505.17873 • Published 23 days ago • 30
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 37
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 37
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 75
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published Apr 1 • 36
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition Paper • 2503.21248 • Published Mar 27 • 20
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published Mar 25 • 51
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published Mar 25 • 51
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish Paper • 2501.08335 • Published Dec 21, 2024
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 98