Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.14683

Multimodal Reasoning

about 17 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 8
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

ByteDance Papers

ByteDance papers collection

about 13 hours ago

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 184
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

about 15 hours ago

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published 3 days ago • 58
Reward Reasoning Model

Paper • 2505.14674 • Published 3 days ago • 27
Qwen3 Technical Report

Paper • 2505.09388 • Published 9 days ago • 141
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published 4 days ago • 69

china open source models

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published 2 days ago • 104

foundation-model-research

Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published 6 days ago • 100
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published 2 days ago • 104

Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3 • 54
NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search

Paper • 2505.14680 • Published 2 days ago • 9
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published 2 days ago • 104

about 4 hours ago

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 10
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 42
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

about 21 hours ago

CoLLM: A Large Language Model for Composed Image Retrieval

Paper • 2503.19910 • Published Mar 25 • 14
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Paper • 2503.21541 • Published Mar 27 • 1
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration

Paper • 2504.03536 • Published Apr 4 • 13
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Paper • 2504.04842 • Published Apr 7 • 36

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated 22 days ago • 352k • 1.39k
microsoft/Phi-4-mini-instruct

Text Generation • Updated 22 days ago • 433k • 478
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 108
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published 2 days ago • 104

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs