Papers
arxiv:2511.15833

EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3

Published on Nov 19
Authors:
,
,

Abstract

EfficientSAM3 uses Progressive Hierarchical Distillation to enable on-device concept segmentation and tracking with high fidelity, using lightweight models.

AI-generated summary

The Segment Anything Model 3 (SAM3) advances visual understanding with Promptable Concept Segmentation (PCS) across images and videos, but its unified architecture (shared vision backbone, DETR-style detector, dense-memory tracker) remains prohibitive for on-device use. We present EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation (PHD) that transfers capability from SAM3 to lightweight students in three stages: (1) Encoder Distillation aligns image features via prompt-in-the-loop training on SA-1B; (2) Temporal Memory Distillation replaces dense memory with a compact Perceiver-based module trained on SA-V to compress and retrieve spatiotemporal features efficiently; and (3) End-to-End Fine-Tuning refines the full pipeline on the official SAM3 PCS data to preserve concept-level performance. PHD yields a spectrum of student variants using RepViT, TinyViT, and EfficientViT backbones, enabling on-device concept segmentation and tracking while maintaining high fidelity to teacher behavior. We benchmark on popular VOS datasets, and compare with varies of releated work, achieing strong performance-efficiency trade-offs.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.15833 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.15833 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.15833 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.