Tune-A-Video-library (Tune a video concepts library)

posted an update about 5 hours ago

Post

303

Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific 𝗶𝗺𝗮𝗴𝗲 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST for experimental testing. 🧤☄️

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers 🤗.

Collection : prithivMLmods/domainnet-exp-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

prithivMLmods

posted an update 4 days ago

Post

2139

Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️

👉GitHub: https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

🥠Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

🥠Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

🥠Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

🥠Model-releases:
https://canopylabs.ai/model-releases

1 reply

·

ginipick

posted an update 5 days ago

Post

4386

🌈✨ FLUX 'Every Text Imaginator'
Multilingual Text-Driven Image Generation and Editing

Demo: ginigen/Every-Text

📝 What is FLUX Text Imaginator?
FLUX Text Imaginator is an innovative tool that leverages cutting-edge FLUX diffusion models to create and edit images with perfectly integrated multilingual text. Unlike other image generation models, FLUX possesses exceptional capability to naturally incorporate text in various languages including Korean, English, Chinese, Japanese, Russian, French, Spanish and more into images!

✨ FLUX's Multilingual Text Processing Strengths

🔤 Superior Multilingual Text Rendering: FLUX renders text with amazing accuracy, including non-English languages and special characters
🇰🇷 Perfect Korean Language Support: Accurately represents complex Korean combined characters
🈶 Excellent East Asian Language Handling: Naturally expresses complex Chinese characters and Japanese text
🔍 Sophisticated Text Placement: Precise text positioning using <text1>, <text2>, <text3> placeholders
🎭 Diverse Text Styles: Text representation in various styles including handwriting, neon, signage, billboards, and more
🔄 Automatic Translation Feature: Korean prompts are automatically translated to English for optimal results

🚀 How It Works

Text Generation Mode:

Enter your prompt (with optional text placeholders)
Specify your desired text in any language
Generate high-quality images with naturally integrated text using FLUX's powerful multilingual processing capabilities
Get two different versions of your image for each generation

Image Editing Mode:

Upload any image
Add editing instructions
Specify new text to add or replace (multilingual support)
Create naturally edited images with FLUX's sophisticated text processing abilities

💻 Technical Details
FLUX's Core Technologies:
-Text-Aware Diffusion Model
-Multilingual Processing Engine
-Korean-English Translation Pipeline
-Optimized Pipeline

2 replies

·

AtAndDev

posted an update 8 days ago

Post

4090

There seems to multiple paid apps shared here that are based on models on hf, but some ppl sell their wrappers as "products" and promote them here. For a long time, hf was the best and only platform to do oss model stuff but with the recent AI website builders anyone can create a product (really crappy ones btw) and try to sell it with no contribution to oss stuff. Please dont do this, or try finetuning the models you use...
Sorry for filling yall feed with this bs but yk...

6 replies

·

prithivMLmods

posted an update 10 days ago

Post

917

Hey Guys! One Small Announcement 🤗
Stranger Zone now accepts LoRA requests!

✍️Request : strangerzonehf/Request-LoRA [ or ] strangerzonehf/Request-LoRA#1

Page : https://huggingface.co/strangerzonehf

Describe the artistic properties by posting sample images or links to similar images in the request discussion. If the adapters you're asking for are truly creative and safe for work, I'll train and upload the LoRA to the Stranger Zone repo!

Thank you!

ginipick

posted an update 12 days ago

Post

3949

🌐 GraphMind: Phi-3 Instruct Graph Explorer

✨ Extract and visualize knowledge graphs from any text in multiple languages!

GraphMind is a powerful tool that leverages the capabilities of Phi-3 to transform unstructured text into structured knowledge graphs, helping you understand complex relationships within any content.

ginigen/Graph-Mind

🚀 Key Features

Multi-language Support 🌍: Process text in English, Korean, and many other languages
Instant Visualization 🧩: See extracted entities and relationships in an interactive graph
Entity Recognition 🏷️: Automatically identifies and categorizes named entities
Optimized Performance ⚡: Uses caching to deliver faster results for common examples
Intuitive Interface 👆: Simple design makes complex graph extraction accessible to everyone

💡 Use Cases

Content Analysis: Extract key entities and relationships from articles or documents
Research Assistance: Quickly visualize connections between concepts in research papers
Educational Tool: Help students understand the structure of complex texts
Multilingual Processing: Extract knowledge from content in various languages

🔧 How It Works

Enter any text in the input field
Select a model from the dropdown
Click "Extract & Visualize"
Explore the interactive knowledge graph and entity recognition results

GraphMind bridges the gap between raw text and structured knowledge, making it easier to identify patterns, extract insights, and understand relationships within any content. Try it now and transform how you interact with textual information!
#NLP #KnowledgeGraph #TextAnalysis #Visualization #Phi3 #MultilingualAI

1 reply

·

AtAndDev

posted an update 12 days ago

Post

1543

Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.

prithivMLmods

posted an update 12 days ago

Post

2469

Gemma-3-4B : Image and Video Inference 🖼️🎥

🧤Space: prithivMLmods/Gemma-3-Multimodal
🥠Git : https://github.com/PRITHIVSAKTHIUR/Gemma-3-Multimodal

@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}

+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct

Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

1 reply

·

not-lain

posted an update 12 days ago

Post

1494

🚀AraClip is now fully integrated with Hugging Face 🤗

AraClip is a specialized CLIP model that was created by @pain and optimized for Arabic text-image retrieval tasks🔥

🔗 Try it out 🔗
🤖 model: Arabic-Clip/araclip
🧩 Gradio demo: Arabic-Clip/Araclip-Simplified
🌐 website: https://arabic-clip.github.io/Arabic-CLIP/

2 replies

·

prithivMLmods

posted an update 13 days ago

Post

2776

Variable Demo for Two Image-to-Text-to-Text Multimodals 🌠

📜Space: prithivMLmods/Multimodal-OCR

By default, it will use:
prithivMLmods/Qwen2-VL-OCR-2B-Instruct or
prithivMLmods/Qwen2-VL-OCR2-2B-Instruct

To trigger Aya-Vision's 8B by @aya-vision , use the prompt:
CohereForAI/aya-vision-8b

prithivMLmods

posted an update 19 days ago

Post

4941

SigLIP2 Image Classification 🧤

> https://huggingface.co/blog/prithivMLmods/siglip2-finetune-image-classification

jayw

authored 2 papers 20 days ago

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 74

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Paper • 2503.01774 • Published 21 days ago • 41

ginipick

posted an update 26 days ago

Post

6434

🚀 Introducing MOUSE: Space Research Thinking on HuggingFace Spaces

🚀 How to Get Started
ginipick/spaces-research-think

Welcome to **MOUSE: Space Research Thinking** – an innovative HuggingFace Spaces project designed to transform how you analyze and interact with Python code. Whether you're a developer, researcher, or simply passionate about coding, this tool provides state-of-the-art analysis, summarization, and usage guidance, all powered by advanced AI.

---

## 🌟 Key Features

- **Real-Time Code Analysis**
Instantly dissect your Python code to reveal its structure, functionality, and potential applications. Our tool delivers:
- **Background & Necessity**: Understand the context behind the code.
- **Functional Utility & Value**: Highlight core functionalities and benefits.
- **Distinctive Features**: Discover what sets the project apart.
- **Target Audience & Applications**: Identify who can benefit and how.
- **Expected Impact**: Envision the improvements and innovations the code can drive.
🔍

- **Visual File Structure Overview**
Navigate your project with ease! A dynamic tree-view displays your file hierarchy in a clear, intuitive format, allowing you to explore directories and files effortlessly. 🌲

- **Interactive Usage Guide**
Receive step-by-step instructions and practical tips on using the tool effectively. Our AI assistant explains everything in an engaging, user-friendly manner, ensuring a smooth learning curve. 💡

- **AI-Powered Code Chat**
Engage in real-time conversations with our AI. Ask questions, request detailed explanations, or dive deeper into code specifics with a chat interface that makes complex topics accessible. 🤖💬

- **Customizable Experience**
Tailor the analysis to your needs with adjustable parameters like token limits and response temperatures, enabling both concise summaries and in-depth explorations. ⚙️

2 replies

·

prithivMLmods

posted an update 27 days ago

Post

5863

Dropping some of the custom fine-tunes based on SigLIP2,
with a single/multi label classification problem type! 🌀🧤

- AI vs Deepfake vs Real : prithivMLmods/AI-vs-Deepfake-vs-Real-Siglip2
- Deepfake Detect : prithivMLmods/Deepfake-Detect-Siglip2
- Fire Detection : prithivMLmods/Fire-Detection-Siglip2
- Deepfake Quality Assess : prithivMLmods/Deepfake-Quality-Assess-Siglip2
- Guard Against Unsafe Content : prithivMLmods/Guard-Against-Unsafe-Content-Siglip2

🌠Collection : prithivMLmods/siglip2-custom-67bcdb2de8fe96b99fb4e19e

ehristoforu

posted an update 28 days ago

Post

2832

Introducing our first standalone model – FluentlyLM Prinum

Introducing the first standalone model from Project Fluently LM! We worked on it for several months, used different approaches and eventually found the optimal one.

General characteristics:
- Model type: Causal language models (QwenForCausalLM, LM Transformer)
- Number of parameters: 32.5B
- Number of parameters (not embedded): 31.0B
- Number of layers: 64
- Context: 131,072 tokens
- Language(s) (NLP): English, French, Spanish, Russian, Chinese, Japanese, Persian (officially supported)
- License: MIT

Creation strategy:
The basis of the strategy is shown in Pic. 2.
We used Axolotl & Unsloth for SFT-finetuning with PEFT LoRA (rank=64, alpha=64) and Mergekit for SLERP and TIES mergers.

Evolution:
🏆 12th place in the Open LLM Leaderboard ( open-llm-leaderboard/open_llm_leaderboard) (21.02.2025)

Detailed results and comparisons are presented in Pic. 3.

Links:
- Model: fluently-lm/FluentlyLM-Prinum
- GGUF version: mradermacher/FluentlyLM-Prinum-GGUF
- Demo on ZeroGPU: ehristoforu/FluentlyLM-Prinum-demo

7 replies

·

prithivMLmods

posted an update about 1 month ago

Post

5835

It's really interesting about the deployment of a new state of matter in Majorana 1: the world’s first quantum processor powered by topological qubits. If you missed this news this week, here are some links for you:

🅱️Topological qubit arrays: https://arxiv.org/pdf/2502.12252

⚛️ Quantum Blog: https://azure.microsoft.com/en-us/blog/quantum/2025/02/19/microsoft-unveils-majorana-1-the-worlds-first-quantum-processor-powered-by-topological-qubits/

📖 Read the story: https://news.microsoft.com/source/features/innovation/microsofts-majorana-1-chip-carves-new-path-for-quantum-computing/

📝 Majorana 1 Intro: https://youtu.be/Q4xCR20Dh1E?si=Z51DbEYnZFp_88Xp

🌀The Path to a Million Qubits: https://youtu.be/wSHmygPQukQ?si=TS80EhI62oWiMSHK

3 replies

·

prithivMLmods

posted an update about 1 month ago

Post

3928

Dino: The Minimalist Multipurpose Chat System 🌠
Agent-Dino : prithivMLmods/Agent-Dino
Github: https://github.com/PRITHIVSAKTHIUR/Agent-Dino

By default, it performs the following tasks:
{Text-to-Text Generation}, {Image-Text-Text Generation}
@image: Generates an image using Stable Diffusion xL.
@3d: Generates a 3D mesh.
@web: Web search agents.
@rAgent: Initiates a reasoning chain using Llama mode for coding explanations.
@tts1-♀, @tts2-♂: Voice generation (Female and Male voices).
@yolo : Object Detection

ginipick

posted an update about 1 month ago

Post

6141

🚀 FLUX Workflow Canvas

Welcome to Workflow Canvas, your ultimate AI-driven platform for crafting stunning design concepts and intricate workflow diagrams that empower your business! 🤖✨

ginigen/Workflow-Canvas

Features
Product Design 🛠️
Transform your ideas into reality with sleek, industrial product designs that blend modern aesthetics with advanced technology.

Mindmap 🧠
Generate vibrant, educational mind maps that outline your strategies and processes in a clear, visually engaging layout.

Mockup 📱
Quickly prototype intuitive app interfaces and web designs using clean, hand-drawn wireframes that capture your vision.

Infographic 📊
Build polished, data-rich infographics that communicate complex corporate metrics and trends with style and clarity.

Diagram 📈
Illustrate comprehensive, end-to-end business workflows—from market analysis to implementation—with detailed and organized diagrams.

Flowchart 🔄
Design easy-to-follow, hand-drawn style flowcharts that map out your operational processes using vibrant colors and minimalistic icons.

How It Works
Set Your Parameters:
Customize your creative process by adjusting the seed, dimensions, inference steps, and guidance scale through the intuitive sidebar.

Choose Your Visual Style:
Explore our diverse range of tabs—from Product Design and Mindmap to Flowchart—each tailored to a unique creative output.

Get Inspired:
Dive into our rich library of example prompts featuring detailed lists and tree structures to instantly populate your design ideas.

Generate Your Masterpiece:
Click the “Generate” button and watch as your ideas come to life in beautifully rendered images! 🎨

Experience the fusion of art and technology with Workflow Canvas – where your business ideas transform into dynamic, visual masterpieces. Get started today and revolutionize the way you design! 🚀