Dev Mode Explorers

community

AI & ML interests

None defined yet.

Recent Activity

dev-mode-explorers's activity

florentgbelidjiย 
posted an update about 17 hours ago
view post
Post
372
๐—ฃ๐—น๐—ฎ๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ฌ๐—ผ๐˜‚๐—ฟ ๐—ก๐—ฒ๐˜…๐˜ ๐—ฆ๐—ธ๐—ถ ๐—”๐—ฑ๐˜ƒ๐—ฒ๐—ป๐˜๐˜‚๐—ฟ๐—ฒ ๐—๐˜‚๐˜€๐˜ ๐—š๐—ผ๐˜ ๐—ฆ๐—บ๐—ฎ๐—ฟ๐˜๐—ฒ๐—ฟ: ๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐—”๐—น๐—ฝ๐—ถ๐—ป๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜!๐Ÿ”๏ธโ›ท๏ธ

With the big hype around AI agents these days, I couldnโ€™t stop thinking about how AI agents could truly enhance real-world activities.
What sort of applications could we build with those AI agents: agentic RAG? self-correcting text-to-sql? Nah, boringโ€ฆ

Passionate about outdoors, Iโ€™ve always dreamed of a tool that could simplify planning mountain trips while accounting for all potential risks. Thatโ€™s why I built ๐—”๐—น๐—ฝ๐—ถ๐—ป๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜, a smart assistant designed to help you plan safe and enjoyable itineraries in the French Alps and Pyrenees.

Built using Hugging Face's ๐˜€๐—บ๐—ผ๐—น๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ library, Alpine Agent combines the power of AI with trusted resources like ๐˜š๐˜ฌ๐˜ช๐˜ต๐˜ฐ๐˜ถ๐˜ณ.๐˜ง๐˜ณ (https://skitour.fr/) and METEO FRANCE. Whether itโ€™s suggesting a route with moderate difficulty or analyzing avalanche risks and weather conditions, this agent dynamically integrates data to deliver personalized recommendations.

In my latest blog post, I share how I developed this projectโ€”from defining tools and integrating APIs to selecting the best LLMs like ๐˜˜๐˜ธ๐˜ฆ๐˜ฏ2.5-๐˜Š๐˜ฐ๐˜ฅ๐˜ฆ๐˜ณ-32๐˜‰-๐˜๐˜ฏ๐˜ด๐˜ต๐˜ณ๐˜ถ๐˜ค๐˜ต, ๐˜“๐˜ญ๐˜ข๐˜ฎ๐˜ข-3.3-70๐˜‰-๐˜๐˜ฏ๐˜ด๐˜ต๐˜ณ๐˜ถ๐˜ค๐˜ต, or ๐˜Ž๐˜—๐˜›-4.

โ›ท๏ธ Curious how AI can enhance adventure planning?โ€จTry the app and share your thoughts: florentgbelidji/alpine-agent

๐Ÿ‘‰ Want to build your own agents? Whether for cooking, sports training, or other passions, the possibilities are endless. Check out the blog post to learn more: https://huggingface.co/blog/florentgbelidji/alpine-agent

Many thanks to @m-ric for helping on building this tool with smolagents!
merveย 
posted an update about 19 hours ago
view post
Post
690
Everything that happened this week in open AI, a recap ๐Ÿค  merve/jan-17-releases-678a673a9de4a4675f215bf5

๐Ÿ‘€ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

๐Ÿ’ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐Ÿคฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐Ÿง™๐Ÿปโ€โ™‚๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

๐Ÿ–ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

๐Ÿ—ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

๐Ÿ“– Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
not-lainย 
posted an update about 20 hours ago
view post
Post
231
we now have more than 2000 public AI models using ModelHubMixin๐Ÿค—
merveย 
posted an update 1 day ago
Tonicย 
posted an update 2 days ago
view post
Post
1156
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it
m-ricย 
posted an update 2 days ago
view post
Post
778
๐— ๐—ถ๐—ป๐—ถ๐— ๐—ฎ๐˜…'๐˜€ ๐—ป๐—ฒ๐˜„ ๐— ๐—ผ๐—˜ ๐—Ÿ๐—Ÿ๐—  ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐Ÿฐ๐—  ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต ๐Ÿ’ฅ

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:

๐Ÿ—๏ธ MoE with novel hybrid attention:
โ€ฃ Mixture of Experts with 456B total parameters (45.9B activated per token)
โ€ฃ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

๐Ÿ† Outperforms leading models across benchmarks while offering vastly longer context:
โ€ฃ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
โ€ฃ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

๐Ÿ”ฌ Technical innovations enable efficient scaling:
โ€ฃ Novel expert parallel and tensor parallel strategies cut communication overhead in half
โ€ฃ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

๐ŸŽฏ Thorough training strategy:
โ€ฃ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! ๐Ÿ“
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here ๐Ÿ‘‰ MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users ๐Ÿ‘‰ MiniMaxAI/MiniMax-Text-01
m-ricย 
posted an update 3 days ago
view post
Post
2147
๐—ช๐—ฒ'๐˜ƒ๐—ฒ ๐—ท๐˜‚๐˜€๐˜ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐˜€๐—บ๐—ผ๐—น๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐˜ƒ๐Ÿญ.๐Ÿฏ.๐Ÿฌ ๐Ÿš€, and it comes with a major feature: you can now log agent runs using OpenTelemetry to inspect them afterwards! ๐Ÿ“Š

This interactive format is IMO much easier to inspect big multi-step runs than endless console logs.

The setup is very easy, in a few lines of code.

Find a tutorial here ๐Ÿ‘‰ https://huggingface.co/docs/smolagents/tutorials/inspect_runs
  • 4 replies
ยท
fdaudensย 
posted an update 3 days ago
view post
Post
1654
AI agents are coming. But who's in control?

@meg , one of the best researchers in AI ethics, makes a critical point about autonomy: fully autonomous systems carry unknowable risks because they operate on computer logic rather than human logic.

The solution? Build systems that support & assist rather than override human decisions.

I highly recommend reading the blog post written by Meg, @evijit @sasha and @giadap . They define different levels of agent autonomy & provide a values-based analysis of risks, benefits, and uses of AI agents to help you make better decisions.

๐Ÿ‘‰ https://huggingface.co/blog/ethics-soc-7

MoritzLaurerย 
posted an update 3 days ago
view post
Post
1765
Microsoft's rStar-Math paper claims that ๐Ÿค ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from Hugging Face !

๐Ÿ“ The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories.
๐Ÿค– A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality.
๐Ÿงช The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problemsโ€”without GPT-4-based data distillation.
๐Ÿ’พ While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub!

Details and links here ๐Ÿ‘‡
Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
Templates on the hub: MoritzLaurer/rstar-math-prompts
Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4
Paper: https://arxiv.org/pdf/2501.04519
Tonicย 
posted an update 4 days ago
view post
Post
2207
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธHey there folks , Open LLM Europe just released Lucie 7B-Instruct model , a billingual instruct model trained on open data ! You can check out my unofficial demo here while we wait for the official inference api from the group : Tonic/Lucie-7B hope you like it ๐Ÿš€
fdaudensย 
posted an update 4 days ago
view post
Post
2236
๐Ÿ”ฅ The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.

๐Ÿ“Š Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum

โš–๏ธ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment

๐ŸŽฏ 6 key recommendations for the road ahead:
- Create rigorous evaluation protocols
- Study societal effects
- Understand ripple effects
- Improve transparency
- Open source can make a positive difference
- Monitor base model evolution

Read the blog post: https://huggingface.co/blog/ethics-soc-7 Brillant work by @meg @evijit @sasha @giadap
merveย 
posted an update 5 days ago
view post
Post
3779
there's a new multimodal retrieval model in town ๐Ÿค 
LlamaIndex released vdr-2b-multi-v1
> uses 70% less image tokens, yet outperforming other dse-qwen2 based models
> 3x faster inference with less VRAM ๐Ÿ’จ
> shrinkable with matryoshka ๐Ÿช†
> can do cross-lingual retrieval!
Collection: llamaindex/visual-document-retrieval-678151d19d2758f78ce910e1 (with models and datasets)
Demo: llamaindex/multimodal_vdr_demo
Learn more from their blog post here https://huggingface.co/blog/vdr-2b-multilingual ๐Ÿ“–
not-lainย 
posted an update 6 days ago
view post
Post
3633
Published a new blogpost ๐Ÿ“–
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
๐Ÿ”— https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :
m-ricย 
posted an update 6 days ago
view post
Post
560
๐—ข๐—ฆ-๐—š๐—ฒ๐—ป๐—ฒ๐˜€๐—ถ๐˜€: ๐—ป๐—ฒ๐˜„ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—ฝ๐—ฟ๐—ผ๐—ฝ๐—ผ๐˜€๐—ฒ๐˜€ ๐—ฎ ๐—ป๐—ผ๐˜ƒ๐—ฒ๐—น ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—บ๐—ฒ๐˜๐—ต๐—ผ๐—ฑ ๐—ณ๐—ผ๐—ฟ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ๐—ฟ-๐—จ๐˜€๐—ฒ-๐—น๐—ถ๐—ธ๐—ฒ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€, ๐˜„๐—ถ๐˜๐—ต ๐—ถ๐—บ๐—ฝ๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€! ๐Ÿ”ฅ

The main bottleneck in building GUI agents it to find training data.
GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do

You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.

โžก๏ธ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.

๐Ÿ” Exploration-driven vs task-driven approach:
โ€ฃ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting
โ€ฃ It then reverse-engineers high-level tasks from successful interaction patterns
โ€ฃ This leads to more natural and diverse training data than predefined tasks

๐ŸŽฏ Novel reward model for trajectory quality:
โ€ฃ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion
โ€ฃ This preserves valuable partial successes that would otherwise be wasted

๐Ÿ† Superior results across environments:
โ€ฃ Nearly doubles performance on AndroidWorld (9.8% โ†’ 17.4%)

By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!

Read the paper here ๐Ÿ‘‰ OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (2412.19723)
MoritzLaurerย 
posted an update 7 days ago
view post
Post
2919
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

๐Ÿ“ The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

๐Ÿค– Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

๐Ÿงช The authors tested different prompt templates on held-out data to ensure their generalization.

๐Ÿ“š It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

๐Ÿ’พ You can now download and reuse these prompt templates via the prompt-templates library!

๐Ÿ”„ The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Letโ€™s make LLM work more transparent and reproducible by sharing more templates like this!

Links ๐Ÿ‘‡
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf
merveย 
posted an update 8 days ago
view post
Post
3540
What a beginning to this year in open ML ๐Ÿค 
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal ๐Ÿ–ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ€” 22k hours worth of samples from instruction videos ๐Ÿคฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs ๐Ÿ’ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐Ÿ”ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐Ÿฌ๐Ÿฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐Ÿ’ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐Ÿ“•
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐Ÿ“•
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

Embeddings ๐Ÿ”–
> @MoritzLaurer released zero-shot version of ModernBERT large ๐Ÿ‘
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐Ÿ”ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
BrigitteTousiย 
posted an update 9 days ago
view post
Post
978
Community fine-tuned models are more carbon efficient than the models they are derived from! ๐Ÿฅณ๐ŸŒฟ

@alozowski @clefourrier @SaylorTwift @albertvillanova evaluated COโ‚‚ emissions associated with model inference for over 3000 models on the Open LLM Leaderboard. Interesting trends and new insights emerged...๐Ÿ‘€

Blog Post: https://huggingface.co/blog/leaderboard-emissions-analysis

Leaderboard: open-llm-leaderboard/open_llm_leaderboard
MoritzLaurerย 
posted an update 9 days ago
view post
Post
1678
The TRL v0.13 release is ๐Ÿ”ฅ! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

๐Ÿง  Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

๐Ÿ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

๐Ÿ› ๏ธ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

โš–๏ธ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here ๐Ÿ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)
merveย 
posted an update 9 days ago
view post
Post
1746
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐Ÿ’— ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐Ÿ’ฌ

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
  • 1 reply
ยท