AdinaY (Adina Yakefu)

reacted to BestWishYsh's post with 🚀🔥 about 10 hours ago

Post

1016

Introducing our new work: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation 🚀

We tackle the core challenges of Subject-to-Video Generation (S2V) by systematically building the first complete infrastructure—featuring an evaluation benchmark and a million-scale dataset! ✨

🧠 Introducing OpenS2V-Eval—the first fine-grained S2V benchmark, with 180 multi-domain prompts + real/synthetic test pairs. We propose NexusScore, NaturalScore, and GmeScore to precisely quantify model performance across subject consistency, naturalness, and text alignment ✔

📊 Using this framework, we conduct a comprehensive evaluation of 16 leading S2V models, revealing their strengths/weaknesses in complex scenarios!

🔥 OpenS2V-5M dataset now available! A 5.4M 720P HD collection of subject-text-video triplets, enabled by cross-video association segmentation + multi-view synthesis for diverse subjects & high-quality annotations 🚀

All resources open-sourced: Paper, Code, Data, and Evaluation Tools 📄
Let's advance S2V research together! 💡

🔗 Links:
Paper: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation (2505.20292)
Code: https://github.com/PKU-YuanGroup/OpenS2V-Nexus
Project: https://pku-yuangroup.github.io/OpenS2V-Nexus
LeaderBoard: BestWishYsh/OpenS2V-Eval
OpenS2V-Eval: BestWishYsh/OpenS2V-Eval
OpenS2V-5M: BestWishYsh/OpenS2V-5M

1 reply

·

posted an update about 12 hours ago

Post

623

🔥 New benchmark & dataset for Subject-to-Video generation

OPENS2V-NEXUS by Pekin University

✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match

1 reply

·

posted an update about 12 hours ago

Post

456

HunyuanVideo-Avatar 🔥 another image to video model byTencent Hunyuan

tencent/HunyuanVideo-Avatar

✨Emotion-controlled, high-dynamic avatar videos
✨Multi-character support with separate audio control
✨Works with any style: cartoon, 3D, real face, while keeping identity consistent

posted an update 1 day ago

Post

970

HunyuanPortrait 🔥 video model by Tencent Hunyuan team.

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation (2503.18860)
tencent/HunyuanPortrait

✨Portrait animation from just one image + a video prompt
✨Diffusion-based, implicit motion control
✨Superior temporal consistency & detail

reacted to fdaudens's post with ❤️ 1 day ago

Post

3262

Just completed the AI Agents course and wow, that capstone project really makes you understand how to build agents that can handle real-world complexity!

The final project uses the GAIA dataset - your agent has to solve tasks like analyzing Excel files, processing audio recordings, answering questions about YouTube videos, and diving into research papers. This isn't toy examples, it's the messy, multimodal stuff agents need to handle in practice.

Whether you’re just getting started with agents or want to go deeper with tools like LangChain, LlamaIndex, and SmolAgents, this course has tons of useful stuff. A few key insights:
- Code agents are incredibly versatile once you get the architecture right
- The sweet spot is finding the right balance of guidance vs autonomy for each use case
- Once the logic clicks, the possibilities really are endless - it's like letting LLMs break free from the chatbox

The course is free and the certification deadline is July 1st, 2025.

The Hugging Face team built something special here. If you're tired of AI that impresses in demos but fails in practice, this is your path to building agents that actually deliver. https://huggingface.co/learn/agents-course/unit0/introduction

Best part? There's the MCP course next!

posted an update 2 days ago

Post

2654

Orsta 🔥 vision language models trained with V-Triune, a unified reinforcement learning system by MiniMax AI

One-RL-to-See-Them-All/one-rl-to-see-them-all-6833d27abce23898b2f9815a

✨ 7B & 32B with MIT license
✨ Masters 8 visual tasks: math, science QA, charts, puzzles, object detection, grounding, OCR, and counting
✨ Uses Dynamic IoU rewards for better visual understanding
✨Strong performance in visual reasoning and perception

posted an update 2 days ago

Post

1950

QwenLong-L1🔥 long-context reasoning model by Alibaba Tongyi Zhiwen team.

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning (2505.17667)
Tongyi-Zhiwen/QwenLong-L1-32B

✨ 32B & Apache 2.0
✨ Outperforms OpenAI-o3-mini & Qwen3-235B-A22B
✨ Trained on a unique 1.6K DocQA RL dataset spanning math, logic & multi-hop reasoning

reacted to merve's post with 🔥 2 days ago

Post

2956

what happened in open AI past week? so many vision LM & omni releases 🔥 merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal 💬🖼️
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text

LLMs
> Mistral released Devstral, a 24B coding assistant (OS) 👩🏻‍💻
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator

posted an update 8 days ago

Post

2712

ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens

posted an update 9 days ago

Post

2366

Dolphin 🔥 A multimodal document image parsing model from ByteDance
, built on an analyze-then-parse paradigm.

ByteDance/Dolphin

✨ MIT licensed
✨ Handles text, tables, figures & formulas via:
- Reading-order layout analysis
- Parallel parsing with smart prompts

posted an update 9 days ago

Post

551

Index-AniSora 🎬 an open anime video model released by Bilibili

👉https://huggingface.co/IndexTeam/Index-anisora

✨ Apache2.0
✨ Supports many 2D styles: anime, manga, VTubers, and more
✨ Fine control over characters and actions with smart masking

replied to their post 10 days ago

coming soon

posted an update 10 days ago

Post

1827

Data quality is the new frontier for LLM performance.

Ultra-FineWeb 📊 a high-quality bilingual dataset released by OpenBMB

openbmb/Ultra-FineWeb

✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering

2 replies

·

reacted to cbensimon's post with 🔥 12 days ago

Post

5626

🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium

3 replies

·

posted an update 12 days ago

Post

1917

2 cool papers from Alibaba Qwen team featured in today’s Daily Papers🔥

✨ WorldPM: Scaling Human Preference Modeling
WorldPM: Scaling Human Preference Modeling (2505.10527)
✨ Parallel Scaling Law for Language Models
Parallel Scaling Law for Language Models (2505.10475)

reacted to merterbak's post with 🔥 14 days ago

Post

2231

Qwen 3 technical report released🚀
Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf

posted an update 14 days ago

Post

2668

Skywork-VL Reward🔥A multimodal reward model for both understanding & reasoning tasks, released by Skywork 昆仑万物-天工

Paper: Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning (2505.07263)
Model: Skywork/Skywork-VL-Reward-7B

✨ 7B
✨ Trained on large scale, high-quality preference data
✨ SOTA on VL-RewardBench + boosts reasoning via MPO

posted an update 15 days ago

Post

2375

Step1X-3D 🔥 an open 3D generation framework by StepFun_ai

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets (2505.07747)
stepfun-ai/Step1X-3D
stepfun-ai/Step1X-3D

✨ 4.8B (1.3B geometry + 3.5B texture) & Apache2.0
✨ Multi-style texture generation (cartoon → photoreal)
✨ Seamless 2D-to-3D control via LoRA
✨ 800K curated 3D assets

1 reply

·

posted an update 15 days ago

Post

2395

Bytedance is on fire this week 🔥🔥🔥

They released Seed1.5-VL, A vision-language model for general-purpose multimodal reasoning.
It’s not open-source, but the paper and demo are available here👇

✨ Seed1.5-VL Technical Report (2505.07062)
✨ ByteDance-Seed/Seed1.5-VL

Adina Yakefu

AI & ML interests

Recent Activity

Organizations

AdinaY's activity