Cuiunbo PRO

Cuiunbo

49 30 73

AI & ML interests

Anything

Recent Activity

upvoted a paper 29 days ago

Liberating LLM Capabilities in Full-Duplex Speech Models

liked a model about 2 months ago

SulphurAI/Sulphur-2-base

liked a model about 2 months ago

openbmb/MiniCPM-V-4.6

View all activity

Organizations

upvoted a paper 29 days ago

Liberating LLM Capabilities in Full-Duplex Speech Models

Paper • 2606.07547 • Published May 4 • 8

upvoted a paper about 2 months ago

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Paper • 2604.27393 • Published Apr 30 • 81

upvoted a paper 10 months ago

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Paper • 2509.18154 • Published Sep 16, 2025 • 63

upvoted 3 papers about 1 year ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22, 2025 • 38

RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23, 2025 • 35

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 124

upvoted an article over 1 year ago

Article

The Large Language Model Course

mlabonne

•

Jan 16, 2025

• 230

upvoted 2 papers over 1 year ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published Oct 14, 2024 • 29

upvoted a paper almost 2 years ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 96

upvoted a collection almost 2 years ago

UI Agent

Collection

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 505 items • Updated 2 days ago • 69

upvoted 2 papers almost 2 years ago

GUICourse: From General Vision Language Models to Versatile GUI Agents

Paper • 2406.11317 • Published Jun 17, 2024 • 2

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18, 2024 • 18

upvoted an article almost 2 years ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

manu

•

Jul 5, 2024

• 323

upvoted a paper about 2 years ago

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26, 2024 • 31

upvoted an article about 2 years ago

Article

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

leonardlin

•

Jun 11, 2024

• 69

upvoted a paper about 2 years ago

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26

upvoted a collection about 2 years ago

ConvLLaVA

Collection

A collection of ConvLLaVA models. • 10 items • Updated May 28, 2024 • 10

upvoted 2 papers about 2 years ago

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23, 2024 • 13

RoHM: Robust Human Motion Reconstruction via Diffusion

Paper • 2401.08570 • Published Jan 16, 2024 • 1

Cuiunbo PRO

AI & ML interests

Recent Activity

Organizations

Cuiunbo's activity

The Large Language Model Course

ColPali: Efficient Document Retrieval with Vision Language Models 👀

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct