Post
276
this paper has been blowing up
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling