Flo Schneider's picture

Flo Schneider

floschne

·

https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/florian-schneider.html

AI & ML interests

Large Vision-Language Models, Cross-modal Retrieval

Organizations

upvoted 2 articles 10 months ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15, 2025

•

228

Article

🪆 Introduction to Matryoshka Embedding Models

+1

Feb 23, 2024

•

205

upvoted a collection 10 months ago

GIMMICK

Datasets of the GIMMICK Benchmark • 3 items • Updated Jun 20, 2025 • 1

upvoted an article 11 months ago

Article

The Transformers Library: standardizing model definitions

+2

May 15, 2025

•

121

upvoted a paper 11 months ago

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 134

upvoted 5 papers about 1 year ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 164

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching

Paper • 2502.12852 • Published Feb 18, 2025 • 3

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 218

GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

Paper • 2502.13766 • Published Feb 19, 2025 • 3

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Paper • 2502.12769 • Published Feb 18, 2025 • 3

upvoted a collection about 1 year ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 10 items • Updated Mar 2 • 562

upvoted a collection over 1 year ago

Centurio

Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 6 items • Updated Feb 4, 2025 • 4

upvoted 2 papers over 1 year ago

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model

Paper • 2501.05122 • Published Jan 9, 2025 • 19

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 161

upvoted a collection over 1 year ago

Qwen2-VL

Vision-language model series based on Qwen2 • 15 items • Updated Mar 2 • 231

upvoted 4 papers over 1 year ago

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published Dec 18, 2024 • 18

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 379

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 73

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111

upvoted a collection over 1 year ago

LLaVA-Onevision

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 16