Adina Yakefu

AdinaY

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Hugging Face Chinese Localization's profile picture Huggingface Projects's profile picture Blog-explorers's profile picture ICCV2023's profile picture Open LLM Leaderboard's profile picture huggingPartyParis's profile picture Qwen's profile picture Journalists on Hugging Face's profile picture Women on Hugging Face's profile picture Social Post Explorers's profile picture Chinese LLMs on Hugging Face's profile picture Hugging Face for Legal's profile picture Inference Endpoints Images's profile picture LeRobot Worldwide Hackathon's profile picture

AdinaY's activity

reacted to BestWishYsh's post with 🚀🔥 about 10 hours ago
view post
Post
1016
Introducing our new work: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation​​ 🚀

We tackle the core challenges of ​​Subject-to-Video Generation (S2V)​​ by systematically building the first complete infrastructure—featuring an evaluation benchmark and a million-scale dataset! ✨

🧠 Introducing ​​OpenS2V-Eval​​—the first ​​fine-grained S2V benchmark​​, with ​​180 multi-domain prompts + real/synthetic test pairs​​. We propose ​​NexusScore​​, ​​NaturalScore​​, and ​​GmeScore​​ to precisely quantify model performance across ​​subject consistency, naturalness, and text alignment​​ ✔

📊 Using this framework, we conduct a ​​comprehensive evaluation of 16 leading S2V models​​, revealing their strengths/weaknesses in complex scenarios!

🔥 ​​OpenS2V-5M dataset​​ now available! A ​​5.4M 720P HD​​ collection of ​​subject-text-video triplets​​, enabled by ​​cross-video association segmentation + multi-view synthesis​​ for ​​diverse subjects & high-quality annotations​​ 🚀

​​All resources open-sourced​​: Paper, Code, Data, and Evaluation Tools 📄
Let's advance S2V research together! 💡

🔗 ​​Links​​:
Paper: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation (2505.20292)
Code: https://github.com/PKU-YuanGroup/OpenS2V-Nexus
Project: https://pku-yuangroup.github.io/OpenS2V-Nexus
LeaderBoard: BestWishYsh/OpenS2V-Eval
OpenS2V-Eval: BestWishYsh/OpenS2V-Eval
OpenS2V-5M: BestWishYsh/OpenS2V-5M
  • 1 reply
·
posted an update about 12 hours ago
view post
Post
623
🔥 New benchmark & dataset for Subject-to-Video generation

OPENS2V-NEXUS by Pekin University

✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
  • 1 reply
·
posted an update about 12 hours ago
view post
Post
456
HunyuanVideo-Avatar 🔥 another image to video model byTencent Hunyuan

tencent/HunyuanVideo-Avatar

✨Emotion-controlled, high-dynamic avatar videos
✨Multi-character support with separate audio control
✨Works with any style: cartoon, 3D, real face, while keeping identity consistent
posted an update 1 day ago
reacted to fdaudens's post with ❤️ 1 day ago
view post
Post
3262
Just completed the AI Agents course and wow, that capstone project really makes you understand how to build agents that can handle real-world complexity!

The final project uses the GAIA dataset - your agent has to solve tasks like analyzing Excel files, processing audio recordings, answering questions about YouTube videos, and diving into research papers. This isn't toy examples, it's the messy, multimodal stuff agents need to handle in practice.

Whether you’re just getting started with agents or want to go deeper with tools like LangChain, LlamaIndex, and SmolAgents, this course has tons of useful stuff. A few key insights:
- Code agents are incredibly versatile once you get the architecture right
- The sweet spot is finding the right balance of guidance vs autonomy for each use case
- Once the logic clicks, the possibilities really are endless - it's like letting LLMs break free from the chatbox

The course is free and the certification deadline is July 1st, 2025.

The Hugging Face team built something special here. If you're tired of AI that impresses in demos but fails in practice, this is your path to building agents that actually deliver. https://huggingface.co/learn/agents-course/unit0/introduction

Best part? There's the MCP course next!
posted an update 2 days ago
view post
Post
2654
Orsta 🔥 vision language models trained with V-Triune, a unified reinforcement learning system by MiniMax AI

One-RL-to-See-Them-All/one-rl-to-see-them-all-6833d27abce23898b2f9815a

✨ 7B & 32B with MIT license
✨ Masters 8 visual tasks: math, science QA, charts, puzzles, object detection, grounding, OCR, and counting
✨ Uses Dynamic IoU rewards for better visual understanding
✨Strong performance in visual reasoning and perception
posted an update 2 days ago
reacted to merve's post with 🔥 2 days ago
view post
Post
2956
what happened in open AI past week? so many vision LM & omni releases 🔥 merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal 💬🖼️
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text



LLMs
> Mistral released Devstral, a 24B coding assistant (OS) 👩🏻‍💻
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator
posted an update 8 days ago
view post
Post
2712
ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens
posted an update 9 days ago
view post
Post
2366
Dolphin 🔥 A multimodal document image parsing model from ByteDance
, built on an analyze-then-parse paradigm.

ByteDance/Dolphin

✨ MIT licensed
✨ Handles text, tables, figures & formulas via:
- Reading-order layout analysis
- Parallel parsing with smart prompts

posted an update 9 days ago
view post
Post
551
Index-AniSora 🎬 an open anime video model released by Bilibili

👉https://huggingface.co/IndexTeam/Index-anisora

✨ Apache2.0
✨ Supports many 2D styles: anime, manga, VTubers, and more
✨ Fine control over characters and actions with smart masking
replied to their post 10 days ago
posted an update 10 days ago
view post
Post
1827
Data quality is the new frontier for LLM performance.

Ultra-FineWeb 📊 a high-quality bilingual dataset released by OpenBMB

openbmb/Ultra-FineWeb

✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering
  • 2 replies
·
reacted to cbensimon's post with 🔥 12 days ago
view post
Post
5626
🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium
·
posted an update 12 days ago
reacted to merterbak's post with 🔥 14 days ago
posted an update 14 days ago
posted an update 15 days ago
posted an update 15 days ago