Multimodal Benchmarking IR

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

zhangysk authored a paper 2 days ago

BABE: Biology Arena BEnchmark

CongWei1230 authored a paper 2 days ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

zhangysk authored a paper 2 days ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

View all activity

zhangysk

authored a paper 2 days ago

BABE: Biology Arena BEnchmark

Paper • 2602.05857 • Published 4 days ago • 9

CongWei1230

authored a paper 2 days ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published 3 days ago • 31

zhangysk

authored 2 papers 2 days ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published 3 days ago • 31

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Paper • 2601.21937 • Published 11 days ago • 18

zhangysk

authored a paper 10 days ago

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published 11 days ago • 42

zhangysk

authored 6 papers 28 days ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 45

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Paper • 2601.06002 • Published about 1 month ago • 52

ychenNLP

authored a paper about 2 months ago

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published Dec 15, 2025 • 34

zhangysk

submitted a paper to Daily Papers about 2 months ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 45

zhangysk

authored a paper 2 months ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 296

zhangysk

authored 6 papers 3 months ago

MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity

Paper • 2511.03146 • Published Nov 5, 2025 • 8

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

Paper • 2511.04285 • Published Nov 6, 2025 • 8

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Paper • 2511.07250 • Published Nov 10, 2025 • 18

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains

Paper • 2511.10984 • Published Nov 14, 2025 • 5

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 38

IFEvalCode: Controlled Code Generation

Paper • 2507.22462 • Published Jul 30, 2025

AI & ML interests

Recent Activity

Team members 5

MBEIR's activity