Prithiv Sakthi's picture

Prithiv Sakthi

prithivMLmods

AI & ML interests

computer vision, multimodality, adapters @starngerzonehf @strangerguardhf

Recent Activity

View all activity

Organizations

Stanford AI's profile picture DataScienceEngineering's profile picture AI FILMS's profile picture Samsung Electronics's profile picture MISATO-dataset's profile picture GEM benchmark's profile picture OpenGVLab's profile picture MusicAI's profile picture BigScience Biomedical Datasets's profile picture OpenVINO Toolkit's profile picture LLMs's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture lora concepts library's profile picture Platzi Community's profile picture Kornia AI's profile picture Tune a video concepts library's profile picture UniversitΓ© Dauphine-PSL's profile picture Keras Dreambooth Event's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture The Waifu Research Department's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture AI Tamil Nadu's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture Team Tonic's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture LocalLLaMA's profile picture Major TOM's profile picture MLX Community's profile picture C4AI Community's profile picture M4-ai's profile picture Chinese LLMs on Hugging Face's profile picture ONNX Community's profile picture Dataset Tools's profile picture Nerdy Face's profile picture Stranger Zone's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture None yet's profile picture Doge Face's profile picture Stranger Guard's profile picture

prithivMLmods's activity

reacted to lysandre's post with πŸ”₯❀️❀️ about 1 hour ago
view post
Post
284
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
Β·
reacted to JingzeShi's post with πŸš€ about 6 hours ago
reacted to DmitryRyumin's post with πŸ”₯ about 7 hours ago
view post
Post
1408
πŸš€πŸŽ­πŸŒŸ New Research Alert - WACV 2025 (Avatars Collection)! πŸŒŸπŸŽ­πŸš€
πŸ“„ Title: EmoVOCA: Speech-Driven Emotional 3D Talking Heads πŸ”

πŸ“ Description: EmoVOCA is a data-driven method for generating emotional 3D talking heads by combining speech-driven lip movements with expressive facial dynamics. This method has been developed to overcome the limitations of corpora and to achieve state-of-the-art animation quality.

πŸ‘₯ Authors: @FedeNoce , Claudio Ferrari, and Stefano Berretti

πŸ“… Conference: WACV, 28 Feb – 4 Mar, 2025 | Arizona, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: https://arxiv.org/abs/2403.12886

🌐 Github Page: https://fedenoce.github.io/emovoca/
πŸ“ Repository: https://github.com/miccunifi/EmoVOCA

πŸš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

πŸš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

πŸš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

πŸ” Keywords: #EmoVOCA #3DAnimation #TalkingHeads #SpeechDriven #FacialExpressions #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #WACV2024
  • 1 reply
Β·
reacted to tegridydev's post with πŸ€— about 7 hours ago
view post
Post
550
Open Source AI Agents | Github/Repo List | [2025]

https://huggingface.co/blog/tegridydev/open-source-ai-agents-directory

Check out the article & Follow, bookmark, save the tab as I will be updating it <3
(using it as my own notepad & decided i might keep it up to date if i post it here, instead of making the 15th_version of it and not saving it with a name i can remember on my desktop lol)
reacted to jsulz's post with ❀️ about 8 hours ago
view post
Post
914
Time flies!

Six months after joining Hugging Face the Xet team is kicking off the first migrations from LFS to our storage for a number of repositories on the Hub.

More on the nitty gritty details behind the migration soon, but here are the big takeaways:

πŸ€– We've successfully completed the first migrations from LFS -> Xet to test the infrastructure and prepare for a wider release

βœ… No action on your part needed - you can work with a Xet-backed repo like any other repo on the Hub (for now - major improvements on their way!)

πŸ‘€ Keep an eye out for the Xet logo to see if a repo you know is on our infra! See the screenshots below to spot the difference πŸ‘‡

⏩ ⏩ ⏩ Blazing uploads and downloads coming soon. W’re gearing up for a full integration with the Hub's Python library that will make building on the Hub faster than ever - special thanks to @celinah and @Wauplin for their assistance.

πŸŽ‰ Want Early Access? If you’re curious and want to test it out the bleeding edge that will power the development experience on the Hub, we’d love to partner with you. Let me know!

This is the culmination of a lot of effort from the entire team. Big round of applause to @sirahd @brianronan @jgodlewski @hoytak @seanses @assafvayner @znation @saba9 @rajatarya @port8080 @yuchenglow
  • 1 reply
Β·
reacted to davanstrien's post with 🧠 about 22 hours ago
view post
Post
1446
Hacked together a way to log trl GRPO training completions to a πŸ€— dataset repo. This allows you to:

- Track rewards from multiple reward functions
- Treat the completion and rewards from training as a "proper" dataset and do EDA
- Share results for open science

The implementation is super hacky, but I'm curious if people would find this useful.

To push completions to the Hub, you just need two extra parameters:

log_completions=True
log_completions_hub_repo='your-username/repo-name'

Example dataset: davanstrien/test-logs
Colab: https://colab.research.google.com/drive/1wzBFPVthRYYTp-mEYlznLg_e_0Za1M3g

reacted to merve's post with 🧠🧠 1 day ago
view post
Post
3843
Google just released PaliGemma 2 Mix: new versatile instruction vision language models πŸ”₯

> Three new models: 3B, 10B, 28B with res 224, 448 πŸ’™
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🀯

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
reacted to burtenshaw's post with πŸš€ 2 days ago
view post
Post
5579
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:

1️⃣ New Supervised Fine-tuning unit in the NLP Course https://huggingface.co/learn/nlp-course/en/chapter11/1
2️⃣New Finetuning for agents bonus module in the Agents Course https://huggingface.co/learn/agents-course/bonus-unit1/introduction

Fine-tuning will squeeze everything out of your model for how you’re using it, more than any prompt.
  • 2 replies
Β·
reacted to AdinaY's post with ❀️ 2 days ago
view post
Post
4070
πŸš€ StepFunι˜Άθ·ƒζ˜ŸθΎ° is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm πŸ”₯but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

πŸ“Ί Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

πŸ”Š Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
Β·
posted an update 3 days ago
view post
Post
3694
Dino: The Minimalist Multipurpose Chat System 🌠
Agent-Dino : prithivMLmods/Agent-Dino
Github: https://github.com/PRITHIVSAKTHIUR/Agent-Dino

By default, it performs the following tasks:
{Text-to-Text Generation}, {Image-Text-Text Generation}
@image: Generates an image using Stable Diffusion xL.
@3d: Generates a 3D mesh.
@web: Web search agents.
@rAgent: Initiates a reasoning chain using Llama mode for coding explanations.
@tts1-♀, @tts2-β™‚: Voice generation (Female and Male voices).
@yolo : Object Detection
reacted to ZennyKenny's post with πŸ€— 3 days ago
view post
Post
2079
Really excited to start contributing to the SWE Arena project: https://swe-arena.com/

Led by IBM PhD fellow @terryyz , our goal is to advance research in code generation and app development by frontier LLMs.

reacted to sayakpaul's post with πŸ”₯ 3 days ago
view post
Post
2707
Inference-time scaling meets Flux.1-Dev (and others) πŸ”₯

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" πŸ€—

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ πŸ€—
posted an update 5 days ago
view post
Post
4404
The last week of Impression Craft Arts and sketches from strangerzonehfπŸŽ¨πŸ§‘πŸ»β€πŸŽ¨

- Collection : strangerzonehf/Flux-Ultimate-LoRA-Collection

Adapters:
+ Ld-Art : strangerzonehf/Ld-Art
+ Animeopix-Flux : strangerzonehf/Animeopix-Flux
+ Flux-Super-Paint-LoRA : strangerzonehf/Flux-Super-Paint-LoRA
+ CinematicShot-Pics-Flux : strangerzonehf/cinematicShot-Pics-Flux
+ Oil-Wall-Art-Flux : strangerzonehf/Oil-Wall-Art-Flux
+ Pixelo-Flux : strangerzonehf/Pixelo-Flux
+ Abstract-Shattered : strangerzonehf/Abstract-Shattered
+ Neon-Impressionism-Flux : strangerzonehf/Neon-Impressionism-Flux
+ NewG-Art : strangerzonehf/NewG-Art

πŸͺ§Demo : prithivMLmods/FLUX-LoRA-DLC
πŸ€—Page : https://huggingface.co/strangerzonehf
reacted to louisbrulenaudet's post with πŸ€— 5 days ago
view post
Post
2960
I am pleased to introduce my first project built upon Hugging Face’s smolagents framework, integrated with Alpaca for financial market analysis automation πŸ¦™πŸ€—

The project implements technical indicators such as the Relative Strength Index (RSI) and Bollinger Bands to provide momentum and volatility analysis. Market data is retrieved through the Alpaca API, enabling access to historical price information across various timeframes.

AI-powered insights are generated using Hugging Face’s inference API, facilitating the analysis of market trends through natural language processing with DuckDuckGo search integration for real-time sentiment analysis based on financial news πŸ¦†

Link to the GitHub project: https://github.com/louisbrulenaudet/agentic-market-tool

reacted to davanstrien's post with πŸ”₯ 5 days ago
reacted to nicolay-r's post with 🀝 6 days ago
view post
Post
2386
πŸ“’ For those who consider a quick and inplace annotation of entities in JSON / CSV tabular data, I got a good news. So far releasing the latest version of the bulk-ner which does these things for you:
🌟 https://github.com/nicolay-r/bulk-ner/releases/tag/0.25.2

bulk-ner is a no-string wrapper over NER service using popular frameworks like DeepPavlov, Spacy, Flair.

What's new? The latest 0.25.2 version has the following key features:
πŸ”§ Fixed: πŸ› the output ignores other input content in input #31
πŸ”₯ Schemas support: you can annotate various coulmns by combining them as you wish and map onto the other output colums (see πŸ“Έ below) #28

Below is the screenshot on how you can quick start of using it with Spacy models.

🌌 List of other providers @ nlp-thirdgate:
https://github.com/nicolay-r/nlp-thirdgate/tree/master/ner
reacted to davanstrien's post with ❀️ 6 days ago
view post
Post
1825
How do you make 1M+ Hugging Face models & datasets more discoverable?

davanstrien/Smol-Hub-tldr!

I fine-tuned HuggingFaceTB/SmolLM2-360M to generate one-line summaries from a model or dataset README.

Its own self-description?
"A model for generating concise summaries of model & dataset cards from the Hugging Face Hub"

The goal? Make it easier to find the right models and datasets for your specific needs. It's already powering a semantic search for datasets Space.

It's still a WIP but thanks to @loubnabnl , @anton-l , @eliebak et al, for cooking such a nice base model for fine-tuning small, efficient models for specific domains and tasks. πŸ™