Hello everyone. I've built a medical AI assistant application.

pranavupadhyaya52/MediWiki_Medical_Assistant

It is a multimodal chatbot and can accept text, radiology images, prescription and lab reports (currently it only accepts one image per chat.) and audio files (wav and MP3 extension files).

It is built on top of a finetuned Llama 3.2 11B vision instruct. It also uses a 41000 medically related question answer pair stored in the form of chromadb embedding for Retrieval Augmented Generation (RAG).

Please let me know your thoughts on my project and how I can improve it further. Thank you.

4 replies

liked a model 22 days ago

facebook/KernelLLM

Text Generation • Updated 15 days ago • 12.7k • 149

reacted to AdinaY's post with 🔥 22 days ago

Post

2403

Dolphin 🔥 A multimodal document image parsing model from ByteDance
, built on an analyze-then-parse paradigm.

ByteDance/Dolphin

✨ MIT licensed
✨ Handles text, tables, figures & formulas via:
- Reading-order layout analysis
- Parallel parsing with smart prompts

liked a model 28 days ago

Skywork/Skywork-VL-Reward-7B

Image-Text-to-Text • Updated 1 day ago • 796 • 39

reacted to AdinaY's post with 🔥 28 days ago

Post

2682

Skywork-VL Reward🔥A multimodal reward model for both understanding & reasoning tasks, released by Skywork 昆仑万物-天工

Paper: Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning (2505.07263)
Model: Skywork/Skywork-VL-Reward-7B

✨ 7B
✨ Trained on large scale, high-quality preference data
✨ SOTA on VL-RewardBench + boosts reasoning via MPO

liked a model 28 days ago

pytorch/Phi-4-mini-instruct-float8dq

Text Generation • Updated 20 days ago • 1.2k • 1

reacted to jeffboudier's post with 🚀 28 days ago

Post

2580

Transcribing 1 hour of audio for less than $0.01 🤯

@mfuntowicz cooked with 8x faster Whisper speech recognition - whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU!

How they did it: https://huggingface.co/blog/fast-whisper-endpoints

1-click deploy with HF Inference Endpoints: https://endpoints.huggingface.co/new?repository=openai%2Fwhisper-large-v3-turbo&vendor=aws&region=us-east&accelerator=gpu&instance_id=aws-us-east-1-nvidia-l4-x1&task=automatic-speech-recognition&no_suggested_compute=true

liked a model 30 days ago

microsoft/Phi-4-mini-reasoning

Text Generation • Updated May 1 • 22.7k • 167

reacted to DawnC's post with 🔥 about 1 month ago

Post

3475

PawMatchAI 🐾: The Complete Dog Breed Platform

PawMatchAI offers a comprehensive suite of features designed for dog enthusiasts and prospective owners alike. This all-in-one platform delivers five essential tools to enhance your canine experience:

1. 🔍Breed Detection: Upload any dog photo and the AI accurately identifies breeds from an extensive database of 124+ different dog breeds. The system detects dogs in the image and provides confident breed identification results.

2.📊Breed Information: Access detailed profiles for each breed covering exercise requirements, typical lifespan, grooming needs, health considerations, and noise behavior - giving you complete understanding of any breed's characteristics.

3.📋 Breed Comparison : Compare any two breeds side-by-side with intuitive visualizations highlighting differences in care requirements, personality traits, health factors, and more - perfect for making informed decisions.

4.💡 Breed Recommendation: Receive personalized breed suggestions based on your lifestyle preferences. The sophisticated matching system evaluates compatibility across multiple factors including living space, exercise capacity, experience level, and family situation.

5.🎨 Style Transfer: Transform your dog photos into artistic masterpieces with five distinct styles: Japanese Anime, Classic Cartoon, Oil Painting, Watercolor, and Cyberpunk - adding a creative dimension to your pet photography.

👋Explore PawMatchAI today:
DawnC/PawMatchAI

If you enjoy this project or find it valuable for your canine companions, I'd greatly appreciate your support with a Like❤️ for this project.

#ArtificialIntelligence #MachineLearning #ComputerVision #PetTech #TechForLife

reacted to merve's post with 🔥 about 1 month ago

Post

5064

A ton of impactful models and datasets in open AI past week, let's summarize the best 🤩 merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3

💬 Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!
> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)
> NVIDIA released new CoT reasoning datasets
🖼️ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model
> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)
🗣️ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model
> Nari released Dia, a 1.6B text-to-speech model
> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model
👩🏻‍💻 JetBrains released Melium models in base and SFT for coding
> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model 🤩

reacted to clem's post with ❤️ about 1 month ago

Post

4064

What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?

6 replies

liked a dataset about 1 month ago

eltorio/ROCOv2-radiology

Viewer • Updated Nov 13, 2024 • 79.8k • 1.69k • 62

updated a model about 1 month ago

Dcas89/Aurea

Image-Text-to-Text • Updated Apr 29 • 1

upvoted an article about 1 month ago

Article

I trained a Language Model to schedule events with GRPO!

•

Apr 29

• 76

liked a Space about 1 month ago

846

Tile Upscaler

🚀

Enhance images with high-resolution quality and HDR effects

reacted to eaddario's post with 👍 about 1 month ago

Post

2274

Until recently, watt-ai/watt-tool-70B was the best performing model in the Berkeley Function-Calling Leaderboard (https://gorilla.cs.berkeley.edu/leaderboard.html), which evaluates LLM's ability to call functions (tools) accurately. The top spot now belongs to Salesforce/Llama-xLAM-2-70b-fc-r and by a quite wide margin!

Layer-wise quantized versions for both models are available at eaddario/Llama-xLAM-2-8b-fc-r-GGUF and eaddario/Watt-Tool-8B-GGUF

reacted to Kseniase's post with 👍 about 1 month ago

Post

6510

6 Free resources on Reinforcement Learning (RL)

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:

1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more

2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples

3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning

5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics

6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439

If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe

reacted to DawnC's post with 🔥 about 2 months ago

Post

4258

I'm excited to introduce VisionScout —an interactive vision tool that makes computer vision both accessible and powerful! 👀🔍

What can VisionScout do right now?
🖼️ Upload any image and detect 80 different object types using YOLOv8.
🔄 Instantly switch between Nano, Medium, and XLarge models depending on your speed vs. accuracy needs.
🎯 Filter specific classes (people, vehicles, animals, etc.) to focus only on what matters to you.
📊 View detailed statistics about detected objects, confidence levels, and spatial distribution.
🎨 Enjoy a clean, intuitive interface with responsive design and enhanced visualizations.

What's next?
I'm working on exciting updates:
- Support for more models
- Video processing and object tracking across frames
- Faster real-time detection
- Improved mobile responsiveness

The goal is to build a complete but user-friendly vision toolkit for both beginners and advanced users.

Try it yourself! 🚀
DawnC/VisionScout

I'd love to hear your feedback , what features would you find most useful? Any specific use cases you'd love to see supported?

Give it a try and let me know your thoughts in the comments! Stay tuned for future updates.

#ComputerVision #ObjectDetection #YOLO #MachineLearning #TechForLife

reacted to nicolay-r's post with 🔥 about 2 months ago

Post

2670

🚀 Delighted to share a major milestone in adapting reasoning techniques for data collections augmentation!
Introducing bulk-chain 1.0.0 -- the first major release of a no-string API for adapting your LLM for Chain-of-Thought alike reasoning over records with large amount of parameters across large datasets.

⭐ Check it out: https://github.com/nicolay-r/bulk-chain

What’s new and why it matters:
📦 Fully no-string API for easy client deployment
🔥 Demos are now standalone projects:

Demos:
📺 bash / shell (dispatched): https://github.com/nicolay-r/bulk-chain-shell
📺 tksheet: https://github.com/nicolay-r/bulk-chain-tksheet-client

Using nlp-thirdgate to host the supported providers:
🌌 LLM providers: https://github.com/nicolay-r/nlp-thirdgate