John6666 (John Smith)

reacted to orasul's post with 🚀 about 7 hours ago

Post

143

Open-source AI agents are achieving State-of-the-Art results on two different Android AI agent benchmarks.

Yesterday, I finished evaluating my Android agent model, deki, on two separate benchmarks: Android Control and Android World. For both benchmarks I used a subset of the dataset without fine-tuning. The results show that image description models like deki enables large LLMs (like GPT-4o, GPT-4.1, and Gemini 2.5) to become State-of-the-Art on Android AI agent benchmarks using only vision capabilities, without relying on Accessibility Trees, on both single-step and multi-step tasks.

All the information is available on GitHub: https://github.com/RasulOs/deki

I have also uploaded the model to Hugging Face:

Space: orasul/deki
(Check the analyze-and-get-yolo endpoint)

Model: orasul/deki-yolo

reacted to eaddario's post with 👍 about 7 hours ago

Post

230

Layer-wise and Pruned versions of cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition

* Tesor-wise: eaddario/Dolphin-Mistral-24B-Venice-Edition-GGUF

* Pruned: eaddario/Dolphin-Mistral-24B-Venice-Edition-pruned-GGUF

Summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.

reacted to kanaria007's post with 👀 about 7 hours ago

Post

161

✅ New Article on Hugging Face: Teaching AI to Learn from Its Own Failures

Title:
🧨 Understanding the Failure Trace Log Protocol: Structuring Errors for Reflective Recovery
🔗 Read it here: https://huggingface.co/blog/kanaria007/understanding-the-failure-trace-log-protocol

Summary:
After introducing adaptive reflection with the Pattern Learning Bridge, this new article focuses on a deeper frontier of metacognitive design: structured failure awareness.

The Failure Trace Log Protocol enables AI systems to treat *failure* not as an anomaly, but as a **first-class structural object** — one that can be diagnosed, classified, and reused as part of future reasoning.

This protocol allows agents to:
• Log failed reasoning jumps with frame, trigger, and trap metadata
• Trace back structural misalignments and rollback safely
• Classify and organize failure types across contexts
• Integrate failure memory into future decision paths

Failure isn’t just an endpoint — it’s a structural signal.
The protocol transforms it into a **learning substrate**.

Key Features:
• Jump-ID and Trap-Type based error localization
• Rollback Stack for safe partial undo
• Linkage to reusable Pattern Learning structures
• Meta-failure detection and semantic trap mapping

The Failure Trace Log integrates with:
• pattern-learning-bridge (trap-based pattern improvement)
• jump-boot (semantic jump localization)
• memory-loop (structural memory encoding)
• problem-readiness (pre-jump error forecasting)

🧠 Protocol Dataset: kanaria007/agi-structural-intelligence-protocols
Useful for:
• Developers building resilient, self-correcting agents
• Researchers modeling structural learning from failure
• Engineers designing safe rollback architectures
• Anyone exploring how an AI can *remember what went wrong — and why*

This is not just debugging.
This is reflective cognition through structural failure encoding.
Not failure *avoidance*.
Failure *awareness*.

reacted to YerbaPage's post with 🔥 1 day ago

Post

2214

Is 100% Pass Rate on HumanEval possible? Yes! ✅

Meet MGDebugger if you are tired of LLMs failing on complex bugs 🤔 Our MGDebugger, just hit 100% accuracy on HumanEval using the DeepSeek-R1 model. 🚀

✨ Demo: learnmlf/MGDebugger
📝 Paper: From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging (2410.01215)
💻 Code: https://github.com/YerbaPage/MGDebugger

HumanEval may be retired, we're ready for the next challenge In more complex scenarios! You may also take look at this repo for a collection of awesome repo-level coding tasks!

🖥️ https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation

reacted to zamal's post with 👍 1 day ago

Post

2371

Hey all
Finally it's happening. DeepGit lite is back now, running on cpu only devices. Just smartly search across Github and spin up conversational agents in the background and have grounded conversation with repositories
Try it out now!!!! zamal/DeepGit

reacted to Parveshiiii's post with 👍 1 day ago

Post

2141

🧠 MathX-5M by XenArcAI — Scalable Math Reasoning for Smarter LLMs

Introducing MathX-5M, a high-quality, instruction-tuned dataset built to supercharge mathematical reasoning in large language models. With 5 million rigorously filtered examples, it spans everything from basic arithmetic to advanced calculus—curated from public sources and enhanced with synthetic data.

🔍 Key Highlights:
- Step-by-step reasoning with verified answers
- Covers algebra, geometry, calculus, logic, and more
- RL-validated correctness and multi-stage filtering
- Ideal for fine-tuning, benchmarking, and educational AI

📂 - XenArcAI/MathX-5M

1 reply

·

reacted to stefan-french's post with 🚀 2 days ago

Post

1887

🚀 We just released the WASM Agent Blueprint!

It shows how to run Python-based AI agents directly in your browser using WebAssembly (WASM) via Pyodide and the OpenAI Agents SDK. There are no installs, it runs straight in your browser.

Try it out and explore the code 👉 https://github.com/mozilla-ai/wasm-agents-blueprint

1 reply

·

reacted to andito's post with 🔥 2 days ago

Post

3593

🧠👁️ Can AI visualize solutions?

Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”?

That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.

These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.

🔧 Mirage is trained in two phases:

1) Grounding: It learns to produce latent tokens anchored in real images.
2) Refinement: The model drops the images and learns to generate visual tokens on its own.

📈 And yes, it works!
On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines.
Smart sketches > empty words.

By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like.
Kudos to the teams at UMass Amherst and MIT behind this exciting work.
Check the paper: Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (2506.17218)

1 reply

·

reacted to anakin87's post with 👍 2 days ago

Post

279

🛡️ AI Guardrails with Open Language Models - Tutorial

📓 https://haystack.deepset.ai/cookbook/safety_moderation_open_lms

How do you ensure your AI application is safe from harmful or inappropriate user inputs?

This is a core requirement for real-world AI deployments. Luckily, several open Language Models are built specifically for safety moderation.

I've been exploring them and put together a hands-on tutorial using the Haystack framework to build your own AI guardrails.

In the notebook, you'll learn how to use and customize:
🔹 Meta Llama Guard (via Hugging Face API)
🔹 IBM Granite Guardian (via Ollama), which can also evaluate RAG specific risk dimensions
🔹 Google ShieldGemma (via Ollama)
🔹 Nvidia NemoGuard models family, including a model for topic control

You'll also see how to integrate content moderation into a 🔎 RAG pipeline.

reacted to danielhanchen's post with 👍❤️ 2 days ago

Post

2544

We fixed more issues! Use --jinja for all!
* Fixed Nanonets OCR-s unsloth/Nanonets-OCR-s-GGUF
* Fixed THUDM GLM-4 unsloth/GLM-4-32B-0414-GGUF
* DeepSeek Chimera v2 is uploading! unsloth/DeepSeek-TNG-R1T2-Chimera-GGUF

reacted to prithivMLmods's post with 🤗 2 days ago

Post

2944

Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, it’s possible. I’ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). 🤗🚀

Download notebooks here :

✦︎ NanonetsOCR : https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp
✦︎ MonkeyOCR : https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo
✦︎ OCRFluxOCR : https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf
✦︎ TyphoonOCR : https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz

🜲 Github : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

What does it do?

1. Performs OCR on the input image
2. Generates a DOCX or PDF file with the input image and the extracted text

.
.
.
To know more about it, visit the model card of the respective model. !!

reacted to tehsinghaffar's post with 👀 2 days ago

Post

401

I am trying to use the following models, I have PRO subscription ...

mistralai/Mixtral-8x7B-Instruct-v0.1
meta-llama/Llama-3.3-70B-Instruct..

I have been using them since yesterday (3rd July 2025) but now both of them are throwing 404 not found. Is there some maintenance going on? Please can i know what is happening

1 reply

·

reacted to aiqtech's post with 🤗 2 days ago

Post

2096

🔥 HuggingFace Heatmap Leaderboard
Visualizing AI ecosystem activity at a glance

aiqtech/Heatmap-Leaderboard

🎯 Introduction
A leaderboard that visualizes the vibrant HuggingFace community activity through heatmaps.

✨ Key Features
📊 Real-time Tracking - Model/dataset/app releases from AI labs and developers
🏆 Auto Ranking - Rankings based on activity over the past year
🎨 Responsive UI - Unique colors per organization, mobile optimized
⚡ Auto Updates - Hourly data refresh for latest information

🌍 Major Participants
Big Tech: OpenAI, Google, Meta, Microsoft, Apple, NVIDIA
AI Startups: Anthropic, Mistral, Stability AI, Cohere, DeepSeek
Chinese Companies: Tencent, Baidu, ByteDance, Qwen
HuggingFace Official: HuggingFaceH4, HuggingFaceM4, lerobot, etc.
Active Developers: prithivMLmods, lllyasviel, multimodalart and many more

🚀 Value
Trend Analysis 📈 Real-time open source contribution insights
Inspiration 💪 Learn from other developers' activity patterns
Ecosystem Growth 🌱 Visualize AI community development

@John6666 @Nymbo @MaziyarPanahi @prithivMLmods @fffiloni @gokaygokay @enzostvs @black-forest-labs @lllyasviel @briaai @multimodalart @unsloth @Xenova @mistralai @meta-llama @facebook @openai @Anthropic @google @allenai @apple @microsoft @nvidia @CohereLabs @ibm-granite @stabilityai @huggingface @OpenEvals @HuggingFaceTB @HuggingFaceH4 @HuggingFaceM4 @HuggingFaceFW @HuggingFaceFV @open-r1 @parler-tts @nanotron @lerobot @distilbert @kakaobrain @NCSOFT @upstage @moreh @LGAI-EXAONE @naver-hyperclovax @OnomaAIResearch @kakaocorp @Baidu @PaddlePaddle @tencent @BAAI @OpenGVLab @InternLM @Skywork @MiniMaxAI @stepfun-ai @ByteDance @Bytedance Seed @bytedance-research @openbmb @THUDM @rednote-hilab @deepseek-ai @Qwen @wan-ai @XiaomiMiMo @IndexTeam @agents-course
@Agents-MCP-Hackathon @akhaliq @alexnasa @Alibaba-NLP
@ArtificialAnalysis @bartowski @bibibi12345 @calcuis
@ChenDY @city96 @Comfy-Org @fancyfeast @fal @google

1 reply

·

reacted to AdinaY's post with 🚀 3 days ago

Post

1724

The Chinese Open Source Heatmap is live 🔥
You can now track the companies/ research labs/ communities powering China’s open source AI movement.

zh-ai-community/model-release-heatmap-zh

Some highlights:

✨Giant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM

✨New players emerging post–DeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI

✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB

✨Research Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP

reacted to sergiopaniego's post with 🚀 3 days ago

Post

1662

Updated my HF Space for vibe testing smol VLMs on object detection, visual grounding, keypoint detection & counting! 👓

🆕 Compare Qwen2.5 VL 3B vs Moondream 2B side-by-side with annotated images & text outputs.

Try examples or test your own images! 🏃

📱Space: sergiopaniego/vlm_object_understanding

reacted to davidberenstein1957's post with 👍 3 days ago

Post

234

🚨 LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

I've written a new entry in our series on the Giskard, BPIFrance and Google Deepmind Phare benchmark(phare.giskard.ai).

This time it covers bias: https://huggingface.co/blog/davidberenstein1957/llms-recognise-bias-but-also-produce-stereotypes

Previous entry on hallucinations: https://huggingface.co/blog/davidberenstein1957/phare-analysis-of-hallucination-in-leading-llms

1 reply

·

reacted to MonsterMMORPG's post with 👀 3 days ago

Post

375

20 FLUX Profile Images I Generated Recently to Change My Profile Photo - Local Kohya FLUX DreamBooth - SwarmUI Generations - 2x Latent Upscaled to 4 Megapixels > https://youtu.be/FvpWy1x5etM

Full up-to-date tutorial with its resources and configs and presets : https://youtu.be/FvpWy1x5etM

reacted to Stelath's post with 👍 3 days ago

Post

278

F1 movie was pretty good 👍

reacted to BFFree's post with 👍 3 days ago

Post

216

In my painting and drawing practice I had wanted abstraction to be an engagement of the viewer. Giving them an opportunity to add meaning to the work through their own enjoyment and contemplation. Overwhelmingly people would prefer to hear about my intent and then ask me questions about my practice.

I found that to be disappointing because I wanted to them to at least ask questions about the piece. Maybe I wasn't good at giving them a way to connect to the art and therefore they would default to asking 'how often I paint, when did I start painting, how did I find the time to paint" etc. Back in the studio these interactions convinced me that I was on my own pursuit to stay creatively fit and active, but not the kind of artist people would be curious about.

These days I have noticed that when I show people my AI creations or the sketches behind them they ask me about "the ethics of AI, if art is in danger and what I intended to do with them."

The other observation I have had is adults typically flip through the creations quickly. Kids look at a few, ask as few questions about what they are, and then tell me about their own art and things they make with friends. Gotta love kids! Here are a few recent ideas.

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity