AI & ML interests

Make all hub models available for conversion to ONNX format.

Recent Activity

prithivMLmodsย 
posted an update 3 days ago
view post
Post
1597
The demo for DREX-062225-exp (Document Retrieval and Extraction eXpert ~ experimental) / typhoon-ocr-3b (a bilingual document parsing model built specifically for real-world documents) / VIREX-062225-exp (Video Information Retrieval and Extraction eXpert ~ experimental) / olmOCR-7B-0225-preview (the document parsing model based on Qwen2VL). ๐Ÿค—

โœฆ Demo : prithivMLmods/Doc-VLMs-OCR ~ ( with .md canvas )

โคท DREX-062225-exp : prithivMLmods/DREX-062225-exp
โคท typhoon-ocr-3b : scb10x/typhoon-ocr-3b
โคท VIREX-062225-exp : prithivMLmods/VIREX-062225-exp
โคท olmOCR-7B-0225-preview : allenai/olmOCR-7B-0225-preview

โคท Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
.
.
.

To know more about it, visit the model card of the respective model. !!
  • 2 replies
ยท
prithivMLmodsย 
posted an update 4 days ago
view post
Post
2588
Updated the docscopeOCR-7B-050425-exp with the DREX-062225-exp, with improved preciseness in table structure and line spacing in the markdown used on the document page. And though this is still an experimental one, it's expected to perform well in the defined DREX use cases [ Document Retrieval and Extraction eXpert โ€“ experimental ocr ]. ๐Ÿ’ป

โคท Model : prithivMLmods/DREX-062225-exp
โคท Demo : prithivMLmods/Doc-VLMs-OCR

โคท Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
โคท Git : https://github.com/PRITHIVSAKTHIUR/DREX.git
.
.
.

To know more about it, visit the model card of the respective model. !!
prithivMLmodsย 
posted an update 7 days ago
view post
Post
1755
The demo for smoldocling / nanonets ocr / typhoon ocr / monkey ocr explores the document OCR capabilities of various newly released multimodal VLMs in a single space. And if you're experiencing or demoing long document image OCR, kindly use the Smoldocling 256M preview [ Smoldocling is back in demo here. ] ๐Ÿค—.

โœฆ Try the demo here : prithivMLmods/Multimodal-OCR2

โคท MonkeyOCR Recognition : echo840/MonkeyOCR
โคท Nanonets-OCR-s : nanonets/Nanonets-OCR-s
โคท SmolDocling-256M-preview : ds4sd/SmolDocling-256M-preview
โคท typhoon-ocr-7b : scb10x/typhoon-ocr-7b

โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

โคท Github : https://github.com/PRITHIVSAKTHIUR/Multimodal-OCR2


The community GPU grant was given by Hugging Face โ€” special thanks to them. ๐Ÿค—๐Ÿš€



To know more about it, visit the model card of the respective model. !!
  • 2 replies
ยท
louisbrulenaudetย 
posted an update 7 days ago
view post
Post
922
๐ŸŒ Clinical Trials Dataset now available on Hugging Face! ๐Ÿงฌ

Iโ€™ve just released a comprehensive, ML-ready dataset featuring 500,000+ clinical trial records sourced directly from ClinicalTrials.gov for biomedical NLP, healthcare analytics, and clinical research applications ๐Ÿค—

I wanted to produce the most complete and up-to-date dump with all raw data partially flattened to simplify extraction, self-querying and processing.

Do you have any ideas about what we can do with it? Using descriptions to enhance specialized embedding models?

louisbrulenaudet/clinical-trials
prithivMLmodsย 
posted an update 10 days ago
view post
Post
3733
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

โœฆ Try the demo here : prithivMLmods/core-OCR
โœฆ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

โคท MonkeyOCR Recognition : echo840/MonkeyOCR
โคท docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
โคท coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
โคท Nanonets-OCR-s : nanonets/Nanonets-OCR-s

โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
โคท Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!
prithivMLmodsย 
posted an update 27 days ago
view post
Post
5671
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. ๐Ÿ“–

โคท Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
โคท Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
โคท Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
โคท Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
โคท Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
โคท Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
โคท Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
โคท Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
โคท Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
โคท AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF๐Ÿค— :
โคท AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
โคท Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
โคท MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
โคท Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
ยท
prithivMLmodsย 
posted an update 29 days ago
view post
Post
2289
Just made a demo for Cosmos-Reason1, a physical AI model that understands physical common sense and generates appropriate embodied decisions in natural language through long chain-of-thought reasoning. Also added video understanding support to it. ๐Ÿค—๐Ÿš€

โœฆ Try the demo here : prithivMLmods/DocScope-R1

โคท Cosmos-Reason1-7B : nvidia/Cosmos-Reason1-7B
โคท docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
โคท Captioner-Relaxed : Ertugrul/Qwen2.5-VL-7B-Captioner-Relaxed

โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

โคท GitHub :
โ€ข https://github.com/PRITHIVSAKTHIUR/Cosmos-x-DocScope
โ€ข https://github.com/PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo.

To know more about it, visit the model card of the respective model. !!
AtAndDevย 
posted an update 29 days ago
view post
Post
2784
deepseek-ai/DeepSeek-R1-0528

This is the end
  • 1 reply
ยท
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
2362
Got access to Google's all-new Gemini Diffusion a state-of-the-art text diffusion model. It delivers the performance of Gemini 2.0 Flash-Lite at 5x the speed, generating over 1000 tokens in a fraction of a second and producing impressive results. Below are some initial outputs generated using the model. โ™Š๐Ÿ”ฅ

Gemini Diffusion Playground โœฆ : https://deepmind.google.com/frontiers/gemini-diffusion

Get Access Here : https://docs.google.com/forms/d/1aLm6J13tAkq4v4qwGR3z35W2qWy7mHiiA0wGEpecooo/viewform?edit_requested=true

๐Ÿ”— To know more, visit: https://deepmind.google/models/gemini-diffusion/
  • 1 reply
ยท
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
2336
The more optimized explicit content filters with lightweight ๐™œ๐™ช๐™–๐™ง๐™™ models trained based on siglip2 patch16 512 and vit patch16 224 for illustration and explicit content classification for content moderation in social media, forums, and parental controls for safer browsing environments. this version fixes the issues in the previous release, which lacked sufficient resources. ๐Ÿš€

โคท Models :
โ†’ siglip2 mini explicit content : prithivMLmods/siglip2-mini-explicit-content [recommended]
โ†’ vit mini explicit content : prithivMLmods/vit-mini-explicit-content

โคท Building image safety-guard models : strangerguardhf

โคท Datasets :
โ†’ nsfw multidomain classification : strangerguardhf/NSFW-MultiDomain-Classification
โ†’ nsfw multidomain classification v2.0 : strangerguardhf/NSFW-MultiDomain-Classification-v2.0

โคท Collection :
โ†’ Updated Versions [05192025] : prithivMLmods/explicit-content-filters-682aaa4733e378561925ca2b
โ†’ Previous Versions : prithivMLmods/siglip2-content-filters-042025-final-680fe4aa1a9d589bf2c915ff

Find a collections inside the collection.๐Ÿ‘†

To know more about it, visit the model card of the respective model.
  • 1 reply
ยท
Aurelien-Morganย 
posted an update about 1 month ago
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
2721
Models for detecting images generated by diffusion models (Flux.1, SDXL, ..) are trained or fine-tuned using image classification models for content moderation. These models use datasets available on the Hub. For identifying AI-generated images or moderating visual content, the recommended model is OpenSDI-Flux.1-SigLIP2.๐Ÿ˜บ๐Ÿงจ

Models : prithivMLmods/OpenSDI-Flux.1-SigLIP2 [Best approach for AI [Diffusion Generated] vs. real image classification] prithivMLmods/OpenSDI-SD2.1-SigLIP2 prithivMLmods/OpenSDI-SD3-SigLIP2 prithivMLmods/OpenSDI-SD1.5-SigLIP2 prithivMLmods/OpenSDI-SDXL-SigLIP2

Datasets : nebula/OpenSDI_test madebyollin/megalith-10m

Collection : prithivMLmods/opensdi-diffusion-generated-image-classification-682488a3a3e5be7083db3383

Find a collections inside the collection.๐Ÿ‘†

To know more about it, visit the model card of the respective model.
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
2047
Dropping some image classification models for content moderation and classifiers trained with datasets available on the Hub. All are fine-tuned on the siglip2 backbone, (competitions AIOrNot, Imagenette, and Driver-Drowsiness). Models and datasets are listed below:

๐Ÿค—Models :
AI or Not : prithivMLmods/AIorNot-SigLIP2
Driver Drowsiness Detection : prithivMLmods/DOZE-GUARD-RLDD
Subset 10 ImageNet : prithivMLmods/IMAGENETTE

๐ŸฅŠDatasets :
+ competitions/aiornot
+ akahana/Driver-Drowsiness-Dataset
+ frgfm/imagenette

๐Ÿ”—Collection :
[The previous collection of models is also listed in the same collection, so you can find more models focused on image classification tasks.]

- prithivMLmods/multiclass-image-classification-05142025-68234c8010a9350a4d6739b5

Find a collections inside the collection.๐Ÿคช๐Ÿ‘†

To know more about it, visit the model card of the respective model.
prithivMLmodsย 
posted an update about 2 months ago
view post
Post
3557
Dropping some image classification models for content moderation, balancers, and classifiers trained on synthetic datasetsโ€”along with others based on datasets available on the Hub. Also loaded a few low-rank datasets for realistic gender portrait classification and document-type classifiers, all fine-tuned on the SigLIP-2 Patch-16 224 backbone. Models and datasets are listed below:

๐Ÿค—Models & Datasets :

Realistic Gender Classification : prithivMLmods/Realistic-Gender-Classification
โŽ™ prithivMLmods/Realistic-Portrait-Gender-1024px
Document Type Detection : prithivMLmods/Document-Type-Detection
โŽ™ prithivMLmods/Document-Type-Detection
Face Mask Detection : prithivMLmods/Face-Mask-Detection
โŽ™ DamarJati/Face-Mask-Detection
Alzheimer Stage Classifier : prithivMLmods/Alzheimer-Stage-Classifier
โŽ™ SilpaCS/Augmented_alzheimer
Bone Fracture Detection : prithivMLmods/Bone-Fracture-Detection
โŽ™ Hemg/bone-fracture-detection
GiD Land Cover Classification : prithivMLmods/GiD-Land-Cover-Classification
โŽ™ jonathan-roberts1/GID

๐Ÿค—Collection : prithivMLmods/siglip2-05102025-681c2b0e406f0740a993fc1c

To know more about it, visit the model card of the respective model.
Nymboย 
posted an update about 2 months ago
view post
Post
2849
Haven't seen this posted anywhere - Llama-3.3-8B-Instruct is available on the new Llama API. Is this a new model or did someone mislabel Llama-3.1-8B?
  • 1 reply
ยท
prithivMLmodsย 
posted an update about 2 months ago
view post
Post
3286
Well, hereโ€™s the updated version with the 20,000+ entry sampled dataset for Watermark Filter Content Moderation models incl. [Food25, Weather, Watermark, Marathi/Hindi Sign Language Detection], post-trained from the base models: sigLip2 patch16 224 โ€” now with mixed aspect ratios for better performance and reduced misclassification. ๐Ÿ”ฅ

Models :
โžฎ Watermark-Detection : prithivMLmods/Watermark-Detection-SigLIP2
โŒจ๏ธŽ Watermark Detection & Batch Image Processing Experimentals, Colab Notebook : https://colab.research.google.com/drive/1mlQrSsSjkGimUt0VyRi3SoWMv8OMyvw3?usp=drive_link
โžฎ Weather-Image-Classification : prithivMLmods/Weather-Image-Classification
โžฎ TurkishFoods-25 : prithivMLmods/TurkishFoods-25
โžฎ Marathi-Sign-Language-Detection : prithivMLmods/Marathi-Sign-Language-Detection
โžฎ Hindi-Sign-Language-Detection : prithivMLmods/Hindi-Sign-Language-Detection

Datasets :
Watermark : qwertyforce/scenery_watermarks
Weather : prithivMLmods/WeatherNet-05-18039
Turkish Foods 25 : yunusserhat/TurkishFoods-25
Marathi Sign Language : VinayHajare/Marathi-Sign-Language
Hindi Sign Language : Vedant3907/Hindi-Sign-Language-Dataset

Collection : prithivMLmods/content-filters-siglip2-vit-68197e3357d4de18fb3b4d2b
prithivMLmodsย 
posted an update about 2 months ago
view post
Post
1200
The new versions of Midjourney Mix adapters have been dropped in stranger zone hf. These adapters excel in studio lighting portraits and painterly styles, trained using the style of strangerzonehf/Flux-Midjourney-Mix2-LoRA. They leverage 24-bit colored synthetic images generated form midjourney v6 to achieve high-quality image reproducibility and support adaptable aspect ratios, using Flux.1 as the base model. ๐Ÿฅณ

Models [ โŒ— ]

> Flux-Midjourney-Painterly-LoRA : strangerzonehf/Flux-Midjourney-Painterly-LoRA
> Flux-Midjourney-Studio-LoRA : strangerzonehf/Flux-Midjourney-Studio-LoRA

> Collection : strangerzonehf/midjourney-mix-3-ft-flux1-dev-68165d58a2a08025852d63f3

> Space : prithivMLmods/FLUX-LoRA-DLC2

The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.