91 98 209

Andres Marafioti

andito

AI & ML interests

Multimodal models, VLM and TTS

Recent Activity

liked a dataset about 8 hours ago

HuggingFaceFW/finepdfs

liked a Space 4 days ago

HuggingFaceM4/FineVision

liked a dataset 5 days ago

data-agents/jupyter-agent-dataset

View all activity

Organizations

published an article about 2 months ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

and 3 others •

Jul 23

• 40

published an article 2 months ago

Article

Efficient MultiModal Data Pipeline

and 4 others •

Jul 8

• 54

published an article 3 months ago

Article

KV Cache from scratch in nanoVLM

and 4 others •

Jun 4

• 93

published an article 3 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

and 8 others •

Jun 3

• 242

published an article 4 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

and 6 others •

May 21

• 211

published an article 4 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 522

published an article 7 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

and 6 others •

Feb 20

• 300

published an article 8 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

and 2 others •

Jan 23

• 183

published an article 10 months ago

Article

SmolVLM - small yet mighty Vision Language Model

and 4 others •

Nov 26, 2024

• 358

published an article 11 months ago

Article

Deploying Speech-to-Speech on Hugging Face

and 3 others •

Oct 22, 2024

• 42

published an article 12 months ago

Article

FineVideo: behind the scenes

and 5 others •

Sep 23, 2024

• 35

published an article about 1 year ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

and 1 other •

Jul 25, 2024

• 17

published an article about 1 year ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

and 1 other •

Jul 18, 2024

• 76

published an article about 1 year ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

and 2 others •

Jun 24, 2024

• 201

Andres Marafioti

AI & ML interests

Recent Activity

Organizations

andito's activity

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, Faster, Stronger)

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Deploying Speech-to-Speech on Hugging Face

FineVideo: behind the scenes

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Docmatix - a huge dataset for Document Visual Question Answering

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models