new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Sep 24

Submitted by

Hennara

Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR

·
7 authors

2

Submitted by

taesiri

Reinforcement Learning on Pre-Training Data

·
36 authors

Submitted by

mhjiang0408

Do You Need Proprioceptive States in Visuomotor Policies?

·
13 authors

Submitted by

taesiri

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

·
34 authors

Submitted by

Silin-Chen

SWE-QA: Can Language Models Answer Repository-level Code Questions?

·
6 authors

Submitted by

Two-hot

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

·
18 authors

Submitted by

WilliamHuang91

MAPO: Mixed Advantage Policy Optimization

·
14 authors

Submitted by

taesiri

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

·
7 authors

Submitted by

lhmd

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

·
10 authors

Submitted by

taesiri

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

·
13 authors

Submitted by

Yunzhen

What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

·
5 authors

2

Submitted by

NatashaEB

Soft Tokens, Hard Truths

·
5 authors

2

Submitted by

ZipW

HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis

·
2 authors

Submitted by

MinhDucBui

Large Language Models Discriminate Against Speakers of German Dialects

·
5 authors

2

Submitted by

ultra7chen

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

·
10 authors

2

Submitted by

emilia-wisnios

OpenGVL - Benchmarking Visual Temporal Progress for Data Curation

·
6 authors

2

Submitted by

Fictionary

GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction

·
7 authors

Submitted by

spapi

Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

·
4 authors

2

Submitted by

jbarrow

CommonForms: A Large, Diverse Dataset for Form Field Detection

·
1 authors

2

Submitted by

taesiri

Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications

·
7 authors

Submitted by

conan1024hao

VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction

·
14 authors

Submitted by

X-iZhang

RadEval: A framework for radiology text evaluation

·
12 authors

Submitted by

abhilekhborah

DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

·
9 authors

Submitted by

jesbu1

PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies

·
9 authors