TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Baichuan Zhou
bczhou
AI & ML interests
Computer Vision
Recent Activity
upvoted
a
paper
9 days ago
MoCha: Towards Movie-Grade Talking Character Synthesis
authored
a paper
21 days ago
LEGION: Learning to Ground and Explain for Synthetic Image Detection
upvoted
a
paper
26 days ago
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Organizations
Collections
1
spaces
1
models
8
bczhou/tiny-llava-v1-hf
Image-Text-to-Text
•
Updated
•
3.01k
•
57
bczhou/TinyLLaVA-2.0B
Image-Text-to-Text
•
Updated
•
325
•
6
bczhou/TinyLLaVA-1.5B
Image-Text-to-Text
•
Updated
•
379
•
17
bczhou/TinyLLaVA-3.1B-Pretrain
Text Generation
•
Updated
•
12
bczhou/TinyLLaVA-3.1B
Text Generation
•
Updated
•
157
•
26
bczhou/TinyLLaVA-2.0B-SigLIP
Updated
•
578
•
1
bczhou/TinyLLaVA-1.5B-SigLIP
Updated
•
52
•
1
bczhou/TinyLLaVA-3.1B-SigLIP
Updated
•
69
•
4
datasets
7
bczhou/UrBench
Updated
•
78
•
3
bczhou/LOKI
Preview
•
Updated
•
68
•
1
bczhou/CityBench-SubTasks
Viewer
•
Updated
•
12.8k
•
5
bczhou/SyntheticBench-Videos
Viewer
•
Updated
•
264
•
7
bczhou/CityBench-v0.3
Viewer
•
Updated
•
9.71k
•
5
bczhou/CityBench-v0.2
Viewer
•
Updated
•
9.71k
•
6
bczhou/CityVQA-v0.2
Viewer
•
Updated
•
2.5k
•
6
•
1