RynnEC: Bringing MLLMs into Embodied World

If our project helps you, please give us a star โญ on Github to support us. ๐Ÿ™๐Ÿ™

๐Ÿ“ฐ News

  • [2025.08.08] ๐Ÿ”ฅ๐Ÿ”ฅ Release our RynnEC-2B model, RynnEC-Bench and training code.

๐ŸŒŸ Introduction

RynnEC is a video multi-modal large language model (MLLM) specifically designed for embodied cognition tasks.

๐Ÿ“Architecture

RynnEC can handle a variety of input types, including images, videos, visual prompts, and task instructions. Visual inputs are processed using a Vision Encoder equipped with an any-resolution strategy, while visual prompts are handled by a region encoder to extract fine-grained features. Textual inputs are seamlessly converted into a unified token stream through tokenization. For video segmentation tasks, a mask decoder is employed to transform the output segmentation embeddings into binary masks, ensuring precise and effective results.

๐ŸŒŽ Model Zoo

Model Base Model HF Link
RynnEC-2B Qwen2.5-1.5B Alibaba-DAMO-Academy/RynnEC-2B

๐Ÿ“Š Main Results

Benchmark comparison across object cognition and spatial cognition. With a highly efficient 2B-parameter architecture, RynnEC-2B achieves state-of-the-art (SOTA) performance on complex spatial cognition tasks.

๐Ÿ“‘ Citation

If you find RynnEC useful for your research and applications, please cite using this BibTeX:

Downloads last month
68
Safetensors
Model size
2.19B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using Alibaba-DAMO-Academy/RynnEC-2B 2

Collection including Alibaba-DAMO-Academy/RynnEC-2B