Prithiv Sakthi's picture

Prithiv Sakthi PRO

prithivMLmods

AI & ML interests

computer vision, multimodality, realism engine adapters @starngerzonehf

Recent Activity

Articles

Organizations

Stanford AI's profile picture DataScienceEngineering's profile picture AI FILMS's profile picture Samsung Electronics's profile picture MISATO-dataset's profile picture GEM benchmark's profile picture OpenGVLab's profile picture MusicAI's profile picture BigScience Biomedical Datasets's profile picture OpenVINO Toolkit's profile picture LLMs's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture lora concepts library's profile picture Open-Source AI Meetup's profile picture Kornia AI's profile picture Universitรฉ Dauphine-PSL's profile picture Platzi Community's profile picture Tune a video concepts library's profile picture Keras Dreambooth Event's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture The Waifu Research Department's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture AI Tamil Nadu's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture Team Tonic's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture LocalLLaMA's profile picture Major TOM's profile picture MLX Community's profile picture C4AI Community's profile picture M4-ai's profile picture Chinese LLMs on Hugging Face's profile picture Dataset Tools's profile picture Nerdy Face's profile picture Stranger Zone's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture

prithivMLmods's activity

reacted to mkurman's post with ๐Ÿ‘ about 3 hours ago
reacted to merve's post with โค๏ธ about 11 hours ago
view post
Post
626
Everything that happened this week in open AI, a recap ๐Ÿค  merve/jan-17-releases-678a673a9de4a4675f215bf5

๐Ÿ‘€ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

๐Ÿ’ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐Ÿคฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐Ÿง™๐Ÿปโ€โ™‚๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

๐Ÿ–ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

๐Ÿ—ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

๐Ÿ“– Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
reacted to hlarcher's post with โค๏ธ 1 day ago
view post
Post
953
We are introducing multi-backend support in Hugging Face Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).

We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned ๐Ÿค— !

Check out the details: https://huggingface.co/blog/tgi-multi-backend
posted an update 1 day ago
view post
Post
1387
ChemQwen-vL [ Qwen for Chem Vision ] ๐Ÿง‘๐Ÿปโ€๐Ÿ”ฌ

๐ŸงชModel : prithivMLmods/ChemQwen-vL

๐Ÿ“ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/

๐Ÿ“’Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju

Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/

๐Ÿค—: @prithivMLmods
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ”ฅ 1 day ago
reacted to davidberenstein1957's post with ๐Ÿ‘€ 4 days ago
reacted to hexgrad's post with ๐Ÿ”ฅ 6 days ago
view post
Post
14867
๐Ÿ“ฃ Looking for labeled, high-quality synthetic audio/TTS data ๐Ÿ“ฃ Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.

What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. โค๏ธ

More details at hexgrad/Kokoro-82M#21
ยท
reacted to dylanebert's post with ๐Ÿ”ฅ 7 days ago
view post
Post
1756
๐ŸŸฆ New Image-to-3D model from Stability AI

stabilityai/stable-point-aware-3d

here's how it looks, with TRELLIS for comparison
reacted to AlexBodner's post with ๐Ÿ‘ 7 days ago
view post
Post
1444
Just published a post explaining Monte Carlo Tree Search: the magic behind AlphaZero and now used to tackle reasoning benchmarks with LLMs. Check it out because it's a must know nowadays!

https://x.com/AlexBodner_/status/1877789879398244382
  • 1 reply
ยท
reacted to merve's post with โค๏ธ 8 days ago
view post
Post
3539
What a beginning to this year in open ML ๐Ÿค 
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal ๐Ÿ–ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ€” 22k hours worth of samples from instruction videos ๐Ÿคฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs ๐Ÿ’ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐Ÿ”ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐Ÿฌ๐Ÿฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐Ÿ’ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐Ÿ“•
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐Ÿ“•
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

Embeddings ๐Ÿ”–
> @MoritzLaurer released zero-shot version of ModernBERT large ๐Ÿ‘
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐Ÿ”ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
reacted to kz919's post with ๐Ÿ‘ 8 days ago
replied to their post 8 days ago
view reply

Iโ€™ll never take that as harsh words; Iโ€™ll take it as my responsibility! My words are my words to look after, and your words are your words to look after.

Thank you!
@JLouisBiz

replied to their post 8 days ago
view reply

@JLouisBiz

I appreciate your offer to help and would welcome any guidance you can provide as we navigate this process.

Thankyou !

replied to their post 9 days ago
view reply

@JLouisBiz

I have already faced issues, but I donโ€™t have a solution that I can submit to the repository end. Now, it seems something has changed. There is nothing misleading about promoting open source, but I lack insight regarding it. I will fix these soon. Thank you for the details you shared; I appreciate it.

Thank you!

reacted to MoritzLaurer's post with ๐Ÿ”ฅ 9 days ago
view post
Post
1678
The TRL v0.13 release is ๐Ÿ”ฅ! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

๐Ÿง  Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

๐Ÿ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

๐Ÿ› ๏ธ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

โš–๏ธ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here ๐Ÿ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)
posted an update 9 days ago
view post
Post
2958
200+ f{๐Ÿค—} on Stranger Zone! [ https://huggingface.co/strangerzonehf ]

โค๏ธโ€๐Ÿ”ฅStranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA

๐Ÿ‘ฝTry Demo: prithivMLmods/FLUX-LoRA-DLC

๐Ÿ“ฆMost Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA

๐Ÿค—Thanks for Community & OPEN SOURCEEE !!
  • 6 replies
ยท
reacted to merve's post with ๐Ÿ”ฅ 9 days ago
view post
Post
1746
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐Ÿ’— ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐Ÿ’ฌ

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
  • 1 reply
ยท
reacted to roseking's post with โค๏ธ 9 days ago
view post
Post
1525
๐ŸŽ‰ Major Updates to HF Daily Paper Newsletter Bot

I'm excited to announce significant improvements to my HF Daily Paper Newsletter Bot! Here are the key updates:

๐Ÿ–ผ๏ธ Enhanced Poster Generation
- Implemented dynamic height adjustment for daily paper posters
- Added support for displaying complete paper content without truncation
- Improved Chinese font rendering and text layout
- Integrated Hugging Face logo for better branding
- Enhanced visual aesthetics with optimized card layouts

๐Ÿ“ Content Improvements
- Removed paper count limitations (previously capped at 5 papers)
- Enhanced title and summary extraction algorithms
- Improved text wrapping and spacing for better readability
- Added proper handling of long content with automatic layout adjustments

๐Ÿ› ๏ธ Technical Enhancements
- Implemented better font loading mechanism with fallback options
- Added support for multiple Chinese font paths
- Improved error handling and logging
- Enhanced memory management for image processing
- Added detailed debugging information

๐ŸŒŸ Visual Design Updates
- Refined color scheme with HF brand colors
- Improved card spacing and padding
- Enhanced typography with better font sizing
- Added smooth transitions between paper cards
- Optimized overall layout for better visual hierarchy

๐Ÿ”ง Infrastructure Updates
- Improved GitHub Actions workflow reliability
- Enhanced error notification system
- Added automatic retries for API calls
- Improved logging and debugging capabilities

The bot now generates more professional and visually appealing daily paper summaries while ensuring complete content display. These updates make the newsletter more readable and informative for our users.

Try it out and let me know what you think! Your feedback helps me make continuous improvements to better serve the AI research community.

#HuggingFace #AI #MachineLearning #ResearchPapers #OpenSource


  • 2 replies
ยท
reacted to mitkox's post with โž• 10 days ago
view post
Post
2394
Can it run DeepSeek V3 671B is the new 'can it run Doom'.

How minimalistic can I go with on device AI with behemoth models - here I'm running DeepSeek V3 MoE on a single A6000 GPU.

Not great, not terrible, for this minimalistic setup. I love the Mixture of Experts architectures. Typically I'm running my core LLM distributed over the 4 GPUs.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
ยท
reacted to m-ric's post with ๐Ÿš€ 11 days ago
view post
Post
4983
Since I published it on GitHub a few days ago,
Hugging Face's new agentic library ๐˜€๐—บ๐—ผ๐—น๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ has gathered nearly 4k stars ๐Ÿคฏ

โžก๏ธ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. โœจ

Sounds like something you'd like to do? Apply here ๐Ÿ‘‰ https://apply.workable.com/huggingface/j/AF1D4E3FEB/
ยท