1 1 9

Oussema Harbi

Harbous

oharbi

AI & ML interests

None yet

Recent Activity

replied to orasul's post 2 days ago

hi, it is deki, and now I am open sourced. An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced. It understands what’s on your screen and can perform tasks based on your voice or text commands. Some examples: * "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late" * "Open Twitter in the browser and write a post about something" * "Read my latest notifications" * "Write a linkedin post about something" Currently, it works only on Android — but support for other OS is planned. The ML and backend codes were also fully open-sourced. Video prompt example: "Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return" License: GPLv3 You can find other AI agent demos or usage examples, like, code generation or object detection in github. Github: https://github.com/RasulOs/deki

reacted to orasul's post with 👍 2 days ago

upvoted a collection about 1 month ago

DIRA – Diraya Arabic Reasoning AI

View all activity

Organizations

None yet

Harbous's activity

replied to orasul's post 2 days ago

Thanks for a great work so far.
It is nice to see someone using more basic ML models to work more efficiently instead of just relying on big models.
I believe one good use-case would be the automatic conversion of diagram images (for engineering or SW) to mermaid diagrams for examples, if the raw text and json outputs are both provided to good coding LLM
Let me know if you are interested in something like that.
This can be a good project - with possible business application - to allow enterprises to make their existing documentation AI ready by doing the conversion.

reacted to orasul's post with 👍 2 days ago

Post

1889

hi, it is deki, and now I am open sourced.

An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.

It understands what’s on your screen and can perform tasks based on your voice or text commands.

Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"

Currently, it works only on Android — but support for other OS is planned.

The ML and backend codes were also fully open-sourced.

Video prompt example:

"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"

License: GPLv3

You can find other AI agent demos or usage examples, like, code generation or object detection in github.

Github: https://github.com/RasulOs/deki

2 replies

upvoted a collection about 1 month ago

DIRA – Diraya Arabic Reasoning AI

Collection

This is an Arabic Reasoning LLM Collection designed for advanced logical inference and instruction-based reasoning in Arabic via datasets and models. • 5 items • Updated Mar 23 • 5

reacted to chansung's post with 👍 3 months ago

Post

1740

New look for AI powered paper reviews from the list by Hugging Face Daily Papers ( managed by the @akhaliq )

Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.

Link: https://deep-diver.github.io/ai-paper-reviewer/

This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)

liked a model 3 months ago

openbmb/MiniCPM-o-2_6

Any-to-Any • Updated 29 days ago • 239k • 1.12k

liked a Space 4 months ago

Open Universal Arabic Asr Leaderboard

🥇

A benchmark for open-source multi-dialect Arabic ASR models

reacted to singhsidhukuldeep's post with 👍 4 months ago

Post

3234

Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.

updated a model 4 months ago

Harbous/SmolLM2-360-finetuned-sql-instruct

Updated Jan 4

liked a model 4 months ago

PowerInfer/SmallThinker-3B-Preview

Text Generation • Updated Jan 16 • 39.5k • 394

reacted to hexgrad's post with ❤️ 4 months ago

Post

4065

Merry Christmas! 🎄 Open sourced a small TTS model at hexgrad/Kokoro-82M

2 replies

liked a dataset 4 months ago

MohamedRashad/Quran-Tafseer

Viewer • Updated Sep 13, 2024 • 219k • 100 • 41

New activity in MohamedRashad/Quran-Tafseer 4 months ago

ideas about automatic summarization of qur'an-tafseer

#2 opened 4 months ago by

rhyssh

reacted to csabakecskemeti's post with 👍 4 months ago

Post

4623

The AMD Instinct MI50 (~$110) is surprisingly fast for inference Quantized models.

This runs a Llama 3.1 8B Q8 with Llama.cpp
https://huggingface.co/spaces/DevQuasar/Mi50

A little blogpost about the HW
http://devquasar.com/uncategorized/amd-radeon-instinct-mi50-cheap-inference/

reacted to freddyaboulton's post with 👍 5 months ago

Post

1185

Just created a cookbook of real time audio/video spaces created using Gradio and WebRTC ⚡️

Use this and the [docs](https://freddyaboulton.github.io/gradio-webrtc/) to get started building the next gen of AI apps!

freddyaboulton/gradio-webrtc-cookbook-6758ba7745aeca7b1be7de0f

2 replies

reacted to etemiz's post with ➕ 5 months ago

Post

429

Apparently you can't count on centralized AI to perform similarly, some days great some days bad. They may be distilling or doing other things to dumb it down and make it cost effective. But you can count on open source LLMs that you run locally to perform same level, every day.

So you always have to watch centralized AI but you never have to watch the local LLM.

liked a model 5 months ago

MohamedRashad/arabic-large-nougat

Image-to-Text • Updated Nov 28, 2024 • 502 • 10

reacted to MohamedRashad's post with ❤️ 5 months ago

Post

1705

A while back i shared this model MohamedRashad/arabic-small-nougat that was a finetune from facebook/nougat-small for the Arabic Language.

Today this humble project has been scaled with new models, new datasets, new space, and a new paper

Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e

1 reply

reacted to singhsidhukuldeep's post with ❤️ 5 months ago

Post

1910

It's not every day you see the No. 1 ranked paper of the day open-sourcing a very powerful image editing app!

Fascinating to see MagicQuill - a groundbreaking interactive image editing system that makes precise photo editing effortless through advanced AI!

The system's architecture features three sophisticated components:

1. Editing Processor:
- Implements a dual-branch architecture integrated into a latent diffusion framework
- Utilizes PiDiNet for edge map extraction and content-aware per-pixel inpainting
- Features a specialized UNet architecture with zero-convolution layers for feature insertion
- Employs denoising score matching for training the control branch
- Processes both structural modifications via scribble guidance and color manipulation through downsampled color blocks
- Maintains pixel-level control through VAE-based latent space operations

2. Painting Assistor:
- Powered by a fine-tuned LLaVA multimodal LLM using Low-Rank Adaptation (LoRA)
- Trained on a custom dataset derived from Densely Captioned Images (DCI)
- Processes user brushstrokes through specialized Q&A tasks for add/subtract/color operations
- Features bounding box coordinate normalization for precise stroke localization
- Implements streamlined single-word/phrase outputs for real-time performance

3. Idea Collector:
- Built as a modular ReactJS component library
- Supports cross-platform deployment via HTTP protocols
- Compatible with Gradio and ComfyUI frameworks
- Features comprehensive layer management and parameter adjustment capabilities
- Implements real-time canvas updates and preview generation

The system outperforms existing solutions like SmartEdit and BrushNet in edge alignment and color fidelity while maintaining seamless integration with popular AI frameworks.

What are your thoughts on AI-powered creative tools?

liked 2 models 6 months ago

funasr/fsmn-vad

Voice Activity Detection • Updated Feb 1, 2024 • 111 • 17

deepseek-ai/Janus-1.3B

Any-to-Any • Updated Jan 27 • 12.4k • 588