Mert Erbak's picture

Mert Erbak PRO

merterbak

AI & ML interests

Currently NLP and Image Processing

Recent Activity

reacted to mmhamdy's post with ๐Ÿ”ฅ about 9 hours ago
๐ŸŽ‰ We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information. ๐Ÿ’ก But what makes MemoryCode unique?! The combination of the following: โœ… Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers. โœ… Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments. โœ… Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information. โœ… Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts. โœ… Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application. ๐Ÿ“Œ Our Findings 1๏ธโƒฃ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions. 2๏ธโƒฃ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them. ๐Ÿ”— Paper: https://huggingface.co/papers/2502.13791 ๐Ÿ“ฆ Code: https://github.com/for-ai/MemoryCode
liked a Space about 11 hours ago
gradio/theme-gallery
View all activity

Organizations

MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture AI Starter Pack's profile picture

merterbak's activity

reacted to mmhamdy's post with ๐Ÿ”ฅ about 9 hours ago
view post
Post
299
๐ŸŽ‰ We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information.

๐Ÿ’ก But what makes MemoryCode unique?! The combination of the following:

โœ… Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers.

โœ… Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments.

โœ… Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information.

โœ… Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts.

โœ… Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application.

๐Ÿ“Œ Our Findings

1๏ธโƒฃ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions.

2๏ธโƒฃ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them.

๐Ÿ”— Paper: From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions (2502.13791)
๐Ÿ“ฆ Code: https://github.com/for-ai/MemoryCode
reacted to lysandre's post with โค๏ธ about 10 hours ago
view post
Post
1446
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท
reacted to onekq's post with ๐Ÿ‘€ about 23 hours ago
view post
Post
1121
Still waiting for ๐Ÿ‘ฝGrok๐Ÿ‘ฝ 3 API โŒ›๐Ÿ˜ž๐Ÿ˜ซ
reacted to their post with ๐Ÿš€ 1 day ago
view post
Post
3381
๐Ÿ”ฅ Meet Muse: that can generate a game environment based on visuals or playersโ€™ controller actions. It was developed by Microsoft Research in collaboration with Ninja Theory (Hellblade developer). Itโ€™s built on something called the World and Human Action Model (WHAM-1.6B model). They trained on 7 years of Bleeding Edge gameplay and it can generate 2 minute long 3D game sequences with consistent physics and character behaviors all from just a second of input. Theyโ€™ve gone and open-sourced it too. Open weights, the WHAM Demonstrator, and sample data on Azure AI Foundry for anyone to play with. Hope so soon on Hugging Face ๐Ÿค—.

๐Ÿ“„ Paper: https://www.nature.com/articles/s41586-025-08600-3
Blog Post: https://www.microsoft.com/en-us/research/blog/introducing-muse-our-first-generative-ai-model-designed-for-gameplay-ideation/

  • 1 reply
ยท
replied to their post 2 days ago
reacted to merve's post with ๐Ÿš€ 2 days ago
view post
Post
4491
Google just released PaliGemma 2 Mix: new versatile instruction vision language models ๐Ÿ”ฅ

> Three new models: 3B, 10B, 28B with res 224, 448 ๐Ÿ’™
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything ๐Ÿคฏ

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
reacted to burtenshaw's post with ๐Ÿš€ 3 days ago
view post
Post
6118
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:

1๏ธโƒฃ New Supervised Fine-tuning unit in the NLP Course https://huggingface.co/learn/nlp-course/en/chapter11/1
2๏ธโƒฃNew Finetuning for agents bonus module in the Agents Course https://huggingface.co/learn/agents-course/bonus-unit1/introduction

Fine-tuning will squeeze everything out of your model for how youโ€™re using it, more than any prompt.
  • 2 replies
ยท
posted an update 3 days ago
view post
Post
3381
๐Ÿ”ฅ Meet Muse: that can generate a game environment based on visuals or playersโ€™ controller actions. It was developed by Microsoft Research in collaboration with Ninja Theory (Hellblade developer). Itโ€™s built on something called the World and Human Action Model (WHAM-1.6B model). They trained on 7 years of Bleeding Edge gameplay and it can generate 2 minute long 3D game sequences with consistent physics and character behaviors all from just a second of input. Theyโ€™ve gone and open-sourced it too. Open weights, the WHAM Demonstrator, and sample data on Azure AI Foundry for anyone to play with. Hope so soon on Hugging Face ๐Ÿค—.

๐Ÿ“„ Paper: https://www.nature.com/articles/s41586-025-08600-3
Blog Post: https://www.microsoft.com/en-us/research/blog/introducing-muse-our-first-generative-ai-model-designed-for-gameplay-ideation/

  • 1 reply
ยท
reacted to fdaudens's post with โค๏ธ 3 days ago
view post
Post
5589
๐ŸŽฏ Perplexity drops their FIRST open-weight model on Hugging Face: A decensored DeepSeek-R1 with full reasoning capabilities. Tested on 1000+ examples for unbiased responses.

Check it out: perplexity-ai/r1-1776
Blog post: https://perplexity.ai/hub/blog/open-sourcing-r1-1776
  • 1 reply
ยท
reacted to clem's post with โค๏ธ 4 days ago
view post
Post
3270
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co/blog/inference-providers
reacted to jasoncorkill's post with โค๏ธ 10 days ago
view post
Post
4514
Runway Gen-3 Alpha: The Style and Coherence Champion

Runway's latest video generation model, Gen-3 Alpha, is something special. It ranks #3 overall on our text-to-video human preference benchmark, but in terms of style and coherence, it outperforms even OpenAI Sora.

However, it struggles with alignment, making it less predictable for controlled outputs.

We've released a new dataset with human evaluations of Runway Gen-3 Alpha: Rapidata's text-2-video human preferences dataset. If you're working on video generation and want to see how your model compares to the biggest players, we can benchmark it for you.

๐Ÿš€ DM us if youโ€™re interested!

Dataset: Rapidata/text-2-video-human-preferences-runway-alpha
  • 1 reply
ยท
reacted to ginipick's post with ๐Ÿ”ฅ 13 days ago
view post
Post
5206
๐ŸŒŸ 3D Llama Studio - AI 3D Generation Platform

๐Ÿ“ Project Overview
3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs.

โœจ Key Features

Text/Image to 3D Conversion ๐ŸŽฏ

Generate 3D models from detailed text descriptions or reference images
Intuitive user interface

Text to Styled Image Generation ๐ŸŽจ

Customizable image generation settings
Adjustable resolution, generation steps, and guidance scale
Supports both English and Korean prompts

๐Ÿ› ๏ธ Technical Features

Gradio-based web interface
Dark theme UI/UX
Real-time image generation and 3D modeling

๐Ÿ’ซ Highlights

User-friendly interface
Real-time preview
Random seed generation
High-resolution output support (up to 2048x2048)

๐ŸŽฏ Applications

Product design
Game asset creation
Architectural visualization
Educational 3D content

๐Ÿ”— Try It Now!
Experience 3D Llama Studio:

ginigen/3D-LLAMA

#AI #3DGeneration #MachineLearning #ComputerVision #DeepLearning
reacted to KnutJaegersberg's post with ๐Ÿ‘€ 13 days ago
view post
Post
2698
A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283
reacted to singhsidhukuldeep's post with ๐Ÿš€ 22 days ago
view post
Post
2028
Exciting breakthrough in AI: AirRAG - A Novel Approach to Retrieval Augmented Generation!

Researchers from Alibaba Cloud have developed a groundbreaking framework that significantly improves how AI systems reason and retrieve information. AirRAG introduces five fundamental reasoning actions that work together to create more accurate and comprehensive responses.

>> Key Technical Innovations:
- Implements Monte Carlo Tree Search (MCTS) for exploring diverse reasoning paths
- Utilizes five core actions: System Analysis, Direct Answer, Retrieval-Answer, Query Transformation, and Summary-Answer
- Features self-consistency verification and process-supervised reward modeling
- Achieves superior performance across complex QA datasets like HotpotQA, MuSiQue, and 2WikiMultiHopQA

>> Under the Hood:
The system expands solution spaces through tree-based search, allowing for multiple reasoning paths to be explored simultaneously. The framework implements computationally optimal strategies, applying more resources to key actions while maintaining efficiency.

>> Results Speak Volumes:
- Outperforms existing RAG methods by over 10% on average
- Shows remarkable scalability with increasing inference computation
- Demonstrates exceptional flexibility in integrating with other advanced technologies

This research represents a significant step forward in making AI systems more capable of complex reasoning tasks. The team's innovative approach combines human-like reasoning with advanced computational techniques, setting new benchmarks in the field.
reacted to AdinaY's post with ๐Ÿ”ฅ about 1 month ago
reacted to StephenGenusa's post with ๐Ÿ‘€ about 1 month ago
view post
Post
1190
I have a pro account and I am logged in. I have duplicated a space due to the error "You have exceeded your GPU quota", I am showing 0 GPU use, yet I am unable to use it "You have exceeded your GPU quota (60s requested vs. 44s left). Create a free account to get more daily usage quota." "Expert Support" is a pitch for consulting.
ยท
reacted to openfree's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
5238
# ๐Ÿงฌ Protein Genesis AI: Design Proteins with Just a Prompt

## ๐Ÿค” Current Challenges in Protein Design

Traditional protein design faces critical barriers:
- ๐Ÿ’ฐ High costs ($1M - $10M+) & long development cycles (2-3 years)
- ๐Ÿ”ฌ Complex equipment and expert knowledge required
- ๐Ÿ“‰ Low success rates (<10%)
- โฐ Time-consuming experimental validation

## โœจ Our Solution: Protein Genesis AI

Transform protein design through simple natural language input:
"Design a protein that targets cancer cells"
"Create an enzyme that breaks down plastic"


### Key Features
- ๐Ÿค– AI-powered automated design
- ๐Ÿ“Š Real-time analysis & optimization
- ๐Ÿ”ฌ Instant 3D visualization
- ๐Ÿ’พ Immediate PDB file generation

## ๐ŸŽฏ Applications

### Medical & Industrial
- ๐Ÿฅ Drug development
- ๐Ÿ’‰ Antibody design
- ๐Ÿญ Industrial enzymes
- โ™ป๏ธ Environmental solutions

### Research & Education
- ๐Ÿ”ฌ Basic research
- ๐Ÿ“š Educational tools
- ๐Ÿงซ Experimental design
- ๐Ÿ“ˆ Data analysis

## ๐Ÿ’ซ Key Advantages

- ๐Ÿ‘จโ€๐Ÿ’ป No coding or technical expertise needed
- โšก Results in minutes (vs. years)
- ๐Ÿ’ฐ 90% cost reduction
- ๐ŸŒ Accessible anywhere

## ๐ŸŽ“ Who Needs This?
- ๐Ÿข Biotech companies
- ๐Ÿฅ Pharmaceutical research
- ๐ŸŽ“ Academic institutions
- ๐Ÿงช Research laboratories

## ๐ŸŒŸ Why It Matters
Protein Genesis AI democratizes protein design by transforming complex processes into simple text prompts. This breakthrough accelerates scientific discovery, potentially leading to faster drug development and innovative biotechnology solutions. The future of protein design starts with a simple prompt! ๐Ÿš€

openfree/ProteinGenesis
ยท
reacted to singhsidhukuldeep's post with ๐Ÿš€ about 2 months ago
view post
Post
3187
Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.
reacted to MonsterMMORPG's post with โค๏ธ about 2 months ago
view post
Post
3360
SANA: Ultra HD Fast Text to Image Model from NVIDIA Step by Step Tutorial on Windows, Cloud & Kaggle โ€” Generate 2048x2048 Images

Below is YouTube link for step by step tutorial and a 1-Click to installer having very advanced Gradio APP to use newest Text-to-Image SANA Model on your Windows PC locally and also on cloud services such as Massed Compute, RunPod and free Kaggle.

https://youtu.be/KW-MHmoNcqo

This above tutorial covers the newest SANA 2K model and I predict SANA 4K model will be published as well. Sana 2K model is 4 MegaPixel so it can generate the following aspect ratio and resolutions very well:

โ€œ1:1โ€: (2048, 2048), โ€œ4:3โ€: (2304, 1792), โ€œ3:4โ€: (1792, 2304),
โ€œ3:2โ€: (2432, 1664), โ€œ2:3โ€: (1664, 2432), โ€œ16:9โ€: (2688, 1536),
โ€œ9:16โ€: (1536, 2688), โ€œ21:9โ€: (3072, 1280), โ€œ9:21โ€: (1280, 3072),
โ€œ4:5โ€: (1792, 2240), โ€œ5:4โ€: (2240, 1792)

I have developed an amazing Gradio app with so many new features :

VAE auto offloading to reduce VRAM usage significantly which is not exists on official pipeline

Gradio APP built upon official pipeline with improvements so works perfect

Batch size working perfect

Number of images working perfect

Multi-line prompting working perfect

Aspect ratios for both 1K and 2K models working perfect

Randomized seed working perfect

1-Click installers for Windows (using Python 3.10 and VENV โ€” isolated), RunPod, Massed Compute and even a free Kaggle account notebook

With proper latest libraries working perfect speed on Windows too

Automatically properly saving every generated image into accurate folder

๐Ÿ”— Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) โคต๏ธ
โ–ถ๏ธ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-116474081

๐Ÿ”— SECourses Official Discord 9500+ Members โคต๏ธ
โ–ถ๏ธ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

  • 2 replies
ยท
reacted to m-ric's post with ๐Ÿš€ 2 months ago
view post
Post
2965
๐Ÿ’ฅ ๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—š๐—ฒ๐—บ๐—ถ๐—ป๐—ถ ๐Ÿฎ.๐Ÿฌ, ๐˜€๐˜๐—ฎ๐—ฟ๐˜๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐—™๐—น๐—ฎ๐˜€๐—ต ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ต๐—ฎ๐˜ ๐˜€๐˜๐—ฒ๐—ฎ๐—บ๐—ฟ๐—ผ๐—น๐—น๐˜€ ๐—š๐—ฃ๐—ง-๐Ÿฐ๐—ผ ๐—ฎ๐—ป๐—ฑ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐Ÿฏ.๐Ÿฒ ๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜! And they start a huge effort on agentic capabilities.

๐Ÿš€ The performance improvements are crazy for such a fast model:
โ€ฃ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed
โ€ฃ Now supports both input AND output of images, video, audio and text
โ€ฃ Can natively use tools like Google Search and execute code

โžก๏ธ If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks ๐Ÿคฏ

๐Ÿค– What about the agentic capabilities?

โ€ฃ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps
โ€ฃ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!)
โ€ฃ Jules: An AI coding agent that integrates with GitHub workflows

I'll be eagerly awaiting further news from Google!

Read their blogpost here ๐Ÿ‘‰ https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/