Thanks for a great work so far.
It is nice to see someone using more basic ML models to work more efficiently instead of just relying on big models.
I believe one good use-case would be the automatic conversion of diagram images (for engineering or SW) to mermaid diagrams for examples, if the raw text and json outputs are both provided to good coding LLM
Let me know if you are interested in something like that.
This can be a good project - with possible business application - to allow enterprises to make their existing documentation AI ready by doing the conversion.
Oussema Harbi
Harbous



Β·
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
about 1 month ago
DIRA β Diraya Arabic Reasoning AI
Organizations
None yet
Harbous's activity
reacted to
orasul's
post with π
2 days ago
Post
1889
hi, it is deki, and now I am open sourced.
An Android AI agent powered by open-source ML model, π±π²πΈπΆ, was fully open-sourced.
It understands whatβs on your screen and can perform tasks based on your voice or text commands.
Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"
Currently, it works only on Android β but support for other OS is planned.
The ML and backend codes were also fully open-sourced.
Video prompt example:
"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"
License: GPLv3
You can find other AI agent demos or usage examples, like, code generation or object detection in github.
Github: https://github.com/RasulOs/deki
An Android AI agent powered by open-source ML model, π±π²πΈπΆ, was fully open-sourced.
It understands whatβs on your screen and can perform tasks based on your voice or text commands.
Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"
Currently, it works only on Android β but support for other OS is planned.
The ML and backend codes were also fully open-sourced.
Video prompt example:
"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"
License: GPLv3
You can find other AI agent demos or usage examples, like, code generation or object detection in github.
Github: https://github.com/RasulOs/deki
upvoted
a
collection
about 1 month ago
reacted to
chansung's
post with π
3 months ago
Post
1740
New look for AI powered paper reviews from the list by Hugging Face Daily Papers ( managed by the
@akhaliq
)
Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.
Link: https://deep-diver.github.io/ai-paper-reviewer/
This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)
Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.
Link: https://deep-diver.github.io/ai-paper-reviewer/
This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)
reacted to
singhsidhukuldeep's
post with π
4 months ago
Post
3234
Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)
Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).
Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.
Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation
Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks
Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.
Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).
Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.
Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation
Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks
Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.
ideas about automatic summarization of qur'an-tafseer
4
#2 opened 4 months ago
by
rhyssh

reacted to
csabakecskemeti's
post with π
4 months ago
Post
4623
The AMD Instinct MI50 (~$110) is surprisingly fast for inference Quantized models.
This runs a Llama 3.1 8B Q8 with Llama.cpp
https://huggingface.co/spaces/DevQuasar/Mi50
A little blogpost about the HW
http://devquasar.com/uncategorized/amd-radeon-instinct-mi50-cheap-inference/
This runs a Llama 3.1 8B Q8 with Llama.cpp
https://huggingface.co/spaces/DevQuasar/Mi50
A little blogpost about the HW
http://devquasar.com/uncategorized/amd-radeon-instinct-mi50-cheap-inference/
reacted to
freddyaboulton's
post with π
5 months ago
Post
1185
Just created a cookbook of real time audio/video spaces created using Gradio and WebRTC β‘οΈ
Use this and the [docs](https://freddyaboulton.github.io/gradio-webrtc/) to get started building the next gen of AI apps!
freddyaboulton/gradio-webrtc-cookbook-6758ba7745aeca7b1be7de0f
Use this and the [docs](https://freddyaboulton.github.io/gradio-webrtc/) to get started building the next gen of AI apps!
freddyaboulton/gradio-webrtc-cookbook-6758ba7745aeca7b1be7de0f
reacted to
etemiz's
post with β
5 months ago
Post
429
Apparently you can't count on centralized AI to perform similarly, some days great some days bad. They may be distilling or doing other things to dumb it down and make it cost effective. But you can count on open source LLMs that you run locally to perform same level, every day.
So you always have to watch centralized AI but you never have to watch the local LLM.
So you always have to watch centralized AI but you never have to watch the local LLM.
reacted to
MohamedRashad's
post with β€οΈ
5 months ago
Post
1705
A while back i shared this model
MohamedRashad/arabic-small-nougat that was a finetune from
facebook/nougat-small for the Arabic Language.
Today this humble project has been scaled with new models, new datasets, new space, and a new paper
Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e
Today this humble project has been scaled with new models, new datasets, new space, and a new paper
Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e
reacted to
singhsidhukuldeep's
post with β€οΈ
5 months ago
Post
1910
It's not every day you see the No. 1 ranked paper of the day open-sourcing a very powerful image editing app!
Fascinating to see MagicQuill - a groundbreaking interactive image editing system that makes precise photo editing effortless through advanced AI!
The system's architecture features three sophisticated components:
1. Editing Processor:
- Implements a dual-branch architecture integrated into a latent diffusion framework
- Utilizes PiDiNet for edge map extraction and content-aware per-pixel inpainting
- Features a specialized UNet architecture with zero-convolution layers for feature insertion
- Employs denoising score matching for training the control branch
- Processes both structural modifications via scribble guidance and color manipulation through downsampled color blocks
- Maintains pixel-level control through VAE-based latent space operations
2. Painting Assistor:
- Powered by a fine-tuned LLaVA multimodal LLM using Low-Rank Adaptation (LoRA)
- Trained on a custom dataset derived from Densely Captioned Images (DCI)
- Processes user brushstrokes through specialized Q&A tasks for add/subtract/color operations
- Features bounding box coordinate normalization for precise stroke localization
- Implements streamlined single-word/phrase outputs for real-time performance
3. Idea Collector:
- Built as a modular ReactJS component library
- Supports cross-platform deployment via HTTP protocols
- Compatible with Gradio and ComfyUI frameworks
- Features comprehensive layer management and parameter adjustment capabilities
- Implements real-time canvas updates and preview generation
The system outperforms existing solutions like SmartEdit and BrushNet in edge alignment and color fidelity while maintaining seamless integration with popular AI frameworks.
What are your thoughts on AI-powered creative tools?
Fascinating to see MagicQuill - a groundbreaking interactive image editing system that makes precise photo editing effortless through advanced AI!
The system's architecture features three sophisticated components:
1. Editing Processor:
- Implements a dual-branch architecture integrated into a latent diffusion framework
- Utilizes PiDiNet for edge map extraction and content-aware per-pixel inpainting
- Features a specialized UNet architecture with zero-convolution layers for feature insertion
- Employs denoising score matching for training the control branch
- Processes both structural modifications via scribble guidance and color manipulation through downsampled color blocks
- Maintains pixel-level control through VAE-based latent space operations
2. Painting Assistor:
- Powered by a fine-tuned LLaVA multimodal LLM using Low-Rank Adaptation (LoRA)
- Trained on a custom dataset derived from Densely Captioned Images (DCI)
- Processes user brushstrokes through specialized Q&A tasks for add/subtract/color operations
- Features bounding box coordinate normalization for precise stroke localization
- Implements streamlined single-word/phrase outputs for real-time performance
3. Idea Collector:
- Built as a modular ReactJS component library
- Supports cross-platform deployment via HTTP protocols
- Compatible with Gradio and ComfyUI frameworks
- Features comprehensive layer management and parameter adjustment capabilities
- Implements real-time canvas updates and preview generation
The system outperforms existing solutions like SmartEdit and BrushNet in edge alignment and color fidelity while maintaining seamless integration with popular AI frameworks.
What are your thoughts on AI-powered creative tools?