Keras Dreambooth Event

community

AI & ML interests

None defined yet.

Recent Activity

keras-dreambooth's activity

prithivMLmods 
posted an update about 5 hours ago
view post
Post
303
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific 𝗶𝗺𝗮𝗴𝗲 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST for experimental testing. 🧤☄️

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers 🤗.

Collection : prithivMLmods/domainnet-exp-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

merve 
posted an update 2 days ago
view post
Post
2203
So many open releases at Hugging Face past week 🤯 recapping all here ⤵️ merve/march-21-releases-67dbe10e185f199e656140ae

👀 Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

💬 LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

🖼️ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥
> YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎤 Audio
> Sesame released CSM-1B new speech generation model (OS)

🤖 Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license
prithivMLmods 
posted an update 4 days ago
view post
Post
2139
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️

👉GitHub: https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

🥠Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

🥠Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

🥠Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

🥠Model-releases:
https://canopylabs.ai/model-releases
  • 1 reply
·
prithivMLmods 
posted an update 10 days ago
view post
Post
917
Hey Guys! One Small Announcement 🤗
Stranger Zone now accepts LoRA requests!

✍️Request : strangerzonehf/Request-LoRA [ or ] strangerzonehf/Request-LoRA#1

Page : https://huggingface.co/strangerzonehf

Describe the artistic properties by posting sample images or links to similar images in the request discussion. If the adapters you're asking for are truly creative and safe for work, I'll train and upload the LoRA to the Stranger Zone repo!

Thank you!
prithivMLmods 
posted an update 12 days ago
view post
Post
2469
Gemma-3-4B : Image and Video Inference 🖼️🎥

🧤Space: prithivMLmods/Gemma-3-Multimodal
🥠Git : https://github.com/PRITHIVSAKTHIUR/Gemma-3-Multimodal

@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}

+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct

Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
  • 1 reply
·
not-lain 
posted an update 12 days ago
prithivMLmods 
posted an update 13 days ago
prithivMLmods 
posted an update 19 days ago
prithivMLmods 
posted an update 27 days ago
view post
Post
5863
Dropping some of the custom fine-tunes based on SigLIP2,
with a single/multi label classification problem type! 🌀🧤

- AI vs Deepfake vs Real : prithivMLmods/AI-vs-Deepfake-vs-Real-Siglip2
- Deepfake Detect : prithivMLmods/Deepfake-Detect-Siglip2
- Fire Detection : prithivMLmods/Fire-Detection-Siglip2
- Deepfake Quality Assess : prithivMLmods/Deepfake-Quality-Assess-Siglip2
- Guard Against Unsafe Content : prithivMLmods/Guard-Against-Unsafe-Content-Siglip2

🌠Collection : prithivMLmods/siglip2-custom-67bcdb2de8fe96b99fb4e19e
prithivMLmods 
posted an update about 1 month ago
view post
Post
5835
It's really interesting about the deployment of a new state of matter in Majorana 1: the world’s first quantum processor powered by topological qubits. If you missed this news this week, here are some links for you:

🅱️Topological qubit arrays: https://arxiv.org/pdf/2502.12252

⚛️ Quantum Blog: https://azure.microsoft.com/en-us/blog/quantum/2025/02/19/microsoft-unveils-majorana-1-the-worlds-first-quantum-processor-powered-by-topological-qubits/

📖 Read the story: https://news.microsoft.com/source/features/innovation/microsofts-majorana-1-chip-carves-new-path-for-quantum-computing/

📝 Majorana 1 Intro: https://youtu.be/Q4xCR20Dh1E?si=Z51DbEYnZFp_88Xp

🌀The Path to a Million Qubits: https://youtu.be/wSHmygPQukQ?si=TS80EhI62oWiMSHK
·
merve 
posted an update about 1 month ago
view post
Post
6414
Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥

> Three new models: 3B, 10B, 28B with res 224, 448 💙
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
prithivMLmods 
posted an update about 1 month ago
view post
Post
3928
Dino: The Minimalist Multipurpose Chat System 🌠
Agent-Dino : prithivMLmods/Agent-Dino
Github: https://github.com/PRITHIVSAKTHIUR/Agent-Dino

By default, it performs the following tasks:
{Text-to-Text Generation}, {Image-Text-Text Generation}
@image: Generates an image using Stable Diffusion xL.
@3d: Generates a 3D mesh.
@web: Web search agents.
@rAgent: Initiates a reasoning chain using Llama mode for coding explanations.
@tts1-♀, @tts2-♂: Voice generation (Female and Male voices).
@yolo : Object Detection
sayakpaul 
posted an update about 1 month ago
view post
Post
3236
Inference-time scaling meets Flux.1-Dev (and others) 🔥

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" 🤗

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ 🤗
prithivMLmods 
posted an update about 1 month ago
view post
Post
4498
The last week of Impression Craft Arts and sketches from strangerzonehf🎨🧑🏻‍🎨

- Collection : strangerzonehf/Flux-Ultimate-LoRA-Collection

Adapters:
+ Ld-Art : strangerzonehf/Ld-Art
+ Animeopix-Flux : strangerzonehf/Animeopix-Flux
+ Flux-Super-Paint-LoRA : strangerzonehf/Flux-Super-Paint-LoRA
+ CinematicShot-Pics-Flux : strangerzonehf/cinematicShot-Pics-Flux
+ Oil-Wall-Art-Flux : strangerzonehf/Oil-Wall-Art-Flux
+ Pixelo-Flux : strangerzonehf/Pixelo-Flux
+ Abstract-Shattered : strangerzonehf/Abstract-Shattered
+ Neon-Impressionism-Flux : strangerzonehf/Neon-Impressionism-Flux
+ NewG-Art : strangerzonehf/NewG-Art

🪧Demo : prithivMLmods/FLUX-LoRA-DLC
🤗Page : https://huggingface.co/strangerzonehf
merve 
posted an update about 1 month ago
view post
Post
4756
Your weekly recap of open AI is here, and it's packed with models! merve/feb-14-releases-67af876b404cc27c6d837767

👀 Multimodal
> OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context
> AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support
> ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size
> Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding

💬 LLMs
A lot of math models!
> Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B
> Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models
> DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math
> LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math

🗣️ Audio
> Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings

🖼️ Vision and Image Generation
> We have ported DepthPro of Apple to transformers for your convenience!
> illustrious-xl-v1.0 is a new illustration generation model
·
kadirnar 
posted an update about 1 month ago
view post
Post
3983
Researchers developed Sonic AI enabling precise facial animation from speech cues 🎧 Decouples head/expression control via audio tone analysis + time-aware fusion for natural long-form synthesis
  • 1 reply
·
prithivMLmods 
posted an update about 1 month ago
view post
Post
4285
QwQ Edge Gets a Small Update..! 💬
try now: prithivMLmods/QwQ-Edge

🚀Now, you can use the following commands for different tasks:

🖼️ @image 'prompt...' → Generates an image
🔉@tts1 'prompt...' → Generates speech in a female voice
🔉 @tts2 'prompt...' → Generates speech in a male voice
🅰️@text 'prompt...' → Enables textual conversation (If not specified, text-to-text generation is the default mode)

💬Multimodality Support : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
💬For text generation, the FastThink-0.5B model ensures quick and efficient responses, prithivMLmods/FastThink-0.5B-Tiny
💬Image Generation: sdxl lightning model, SG161222/RealVisXL_V4.0_Lightning

Github: https://github.com/PRITHIVSAKTHIUR/QwQ-Edge

graph TD
    A[User Interface] --> B[Chat Logic]
    B --> C{Command Type}
    C -->|Text| D[FastThink-0.5B]
    C -->|Image| E[Qwen2-VL-OCR-2B]
    C -->|@image| F[Stable Diffusion XL]
    C -->|@tts| G[Edge TTS]
    D --> H[Response]
    E --> H
    F --> H
    G --> H
lucifertrj 
posted an update about 1 month ago
view post
Post
556
Bhagavad Gita GPT assistant - Build fast RAG pipeline to index 1000+ pages using Binary Optimization

DeepSeek R-1 and Qdrant Binary Quantization

Check out the latest tutorial where we build a Bhagavad Gita GPT assistant—covering:
- DeepSeek R1 vs OpenAI O1
- Using Qdrant client with Binary Quantization
- Building the RAG pipeline with LlamaIndex
- Running inference with DeepSeek R1 Distill model on Groq
- Develop Streamlit app for the chatbot inference

Watch the full implementation here: https://www.youtube.com/watch?v=NK1wp3YVY4Q
  • 1 reply
·
merve 
posted an update about 2 months ago
view post
Post
3138
Interesting releases in open AI this week, let's recap 🤠 merve/feb-7-releases-67a5f7d7f172d8bfe0dd66f4

🤖 Robotics
> Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0)

💬 LLMs
> Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math 🤯 s1-32B and s1K is out!
> Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis
> Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages

👀 Multimodal
> PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything
> OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese
> Krutrim released Chitrarth, a new vision language model for Indic languages and English

🖼️ Vision
> BiRefNet_HR is a new higher resolution BiRefNet for background removal

🗣️ Audio
> kyutai released Hibiki, it's a real-time speech-to-speech translation model 🤯 it's available for French-English translation
> Krutrim released Dhwani, a new STT model for Indic languages
> They also release a new dataset for STT-TTS

🖼️ Image Generation
> Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation
> Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint
> boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan