Efficient Vision Encoding for Vision Language Models
Apple
Verified
AI & ML interests
None defined yet.
Recent Activity
Benchmark for the design of efficient continual learning of image-text models over years.
AIM: Autoregressive Image Models
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction • 0.3B • Updated • 1.02k • 58 -
apple/aimv2-huge-patch14-224
Image Feature Extraction • 0.7B • Updated • 223 • 12 -
apple/aimv2-1B-patch14-224
Image Feature Extraction • 1B • Updated • 74 • 7 -
apple/aimv2-3B-patch14-224
Image Feature Extraction • 3B • Updated • 30 • 3
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 3 -
apple/mobileclip_s0_timm
Image Classification • Updated • 522 • 10 -
apple/mobileclip_s1_timm
Image Classification • Updated • 140 • 2 -
apple/mobileclip_s2_timm
Image Classification • Updated • 433 • 5
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
CLIP Models trained using DFN-2B/DFN-5B datasets
DCLM Models + Datasets
Efficient Vision Encoding for Vision Language Models
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction • 0.3B • Updated • 1.02k • 58 -
apple/aimv2-huge-patch14-224
Image Feature Extraction • 0.7B • Updated • 223 • 12 -
apple/aimv2-1B-patch14-224
Image Feature Extraction • 1B • Updated • 74 • 7 -
apple/aimv2-3B-patch14-224
Image Feature Extraction • 3B • Updated • 30 • 3
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 3 -
apple/mobileclip_s0_timm
Image Classification • Updated • 522 • 10 -
apple/mobileclip_s1_timm
Image Classification • Updated • 140 • 2 -
apple/mobileclip_s2_timm
Image Classification • Updated • 433 • 5
Benchmark for the design of efficient continual learning of image-text models over years.
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
CLIP Models trained using DFN-2B/DFN-5B datasets
AIM: Autoregressive Image Models
DCLM Models + Datasets