Mert Erbak's picture

Mert Erbak PRO

merterbak

AI & ML interests

Currently NLP and Image Processing

Recent Activity

reacted to mmhamdy's post with ๐Ÿ”ฅ about 8 hours ago
๐ŸŽ‰ We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information. ๐Ÿ’ก But what makes MemoryCode unique?! The combination of the following: โœ… Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers. โœ… Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments. โœ… Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information. โœ… Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts. โœ… Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application. ๐Ÿ“Œ Our Findings 1๏ธโƒฃ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions. 2๏ธโƒฃ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them. ๐Ÿ”— Paper: https://huggingface.co/papers/2502.13791 ๐Ÿ“ฆ Code: https://github.com/for-ai/MemoryCode
liked a Space about 10 hours ago
gradio/theme-gallery
View all activity

Organizations

MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture AI Starter Pack's profile picture

merterbak's activity

reacted to mmhamdy's post with ๐Ÿ”ฅ about 8 hours ago
view post
Post
299
๐ŸŽ‰ We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information.

๐Ÿ’ก But what makes MemoryCode unique?! The combination of the following:

โœ… Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers.

โœ… Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments.

โœ… Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information.

โœ… Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts.

โœ… Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application.

๐Ÿ“Œ Our Findings

1๏ธโƒฃ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions.

2๏ธโƒฃ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them.

๐Ÿ”— Paper: From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions (2502.13791)
๐Ÿ“ฆ Code: https://github.com/for-ai/MemoryCode
reacted to lysandre's post with โค๏ธ about 9 hours ago
view post
Post
1446
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท
liked a Space about 10 hours ago
reacted to onekq's post with ๐Ÿ‘€ about 23 hours ago
view post
Post
1121
Still waiting for ๐Ÿ‘ฝGrok๐Ÿ‘ฝ 3 API โŒ›๐Ÿ˜ž๐Ÿ˜ซ
reacted to their post with ๐Ÿš€ 1 day ago
view post
Post
3381
๐Ÿ”ฅ Meet Muse: that can generate a game environment based on visuals or playersโ€™ controller actions. It was developed by Microsoft Research in collaboration with Ninja Theory (Hellblade developer). Itโ€™s built on something called the World and Human Action Model (WHAM-1.6B model). They trained on 7 years of Bleeding Edge gameplay and it can generate 2 minute long 3D game sequences with consistent physics and character behaviors all from just a second of input. Theyโ€™ve gone and open-sourced it too. Open weights, the WHAM Demonstrator, and sample data on Azure AI Foundry for anyone to play with. Hope so soon on Hugging Face ๐Ÿค—.

๐Ÿ“„ Paper: https://www.nature.com/articles/s41586-025-08600-3
Blog Post: https://www.microsoft.com/en-us/research/blog/introducing-muse-our-first-generative-ai-model-designed-for-gameplay-ideation/

  • 1 reply
ยท
replied to their post 2 days ago
reacted to merve's post with ๐Ÿš€ 2 days ago
view post
Post
4491
Google just released PaliGemma 2 Mix: new versatile instruction vision language models ๐Ÿ”ฅ

> Three new models: 3B, 10B, 28B with res 224, 448 ๐Ÿ’™
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything ๐Ÿคฏ

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
reacted to burtenshaw's post with ๐Ÿš€ 3 days ago
view post
Post
6118
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:

1๏ธโƒฃ New Supervised Fine-tuning unit in the NLP Course https://huggingface.co/learn/nlp-course/en/chapter11/1
2๏ธโƒฃNew Finetuning for agents bonus module in the Agents Course https://huggingface.co/learn/agents-course/bonus-unit1/introduction

Fine-tuning will squeeze everything out of your model for how youโ€™re using it, more than any prompt.
  • 2 replies
ยท
posted an update 3 days ago
view post
Post
3381
๐Ÿ”ฅ Meet Muse: that can generate a game environment based on visuals or playersโ€™ controller actions. It was developed by Microsoft Research in collaboration with Ninja Theory (Hellblade developer). Itโ€™s built on something called the World and Human Action Model (WHAM-1.6B model). They trained on 7 years of Bleeding Edge gameplay and it can generate 2 minute long 3D game sequences with consistent physics and character behaviors all from just a second of input. Theyโ€™ve gone and open-sourced it too. Open weights, the WHAM Demonstrator, and sample data on Azure AI Foundry for anyone to play with. Hope so soon on Hugging Face ๐Ÿค—.

๐Ÿ“„ Paper: https://www.nature.com/articles/s41586-025-08600-3
Blog Post: https://www.microsoft.com/en-us/research/blog/introducing-muse-our-first-generative-ai-model-designed-for-gameplay-ideation/

  • 1 reply
ยท
reacted to fdaudens's post with โค๏ธ 3 days ago
view post
Post
5589
๐ŸŽฏ Perplexity drops their FIRST open-weight model on Hugging Face: A decensored DeepSeek-R1 with full reasoning capabilities. Tested on 1000+ examples for unbiased responses.

Check it out: perplexity-ai/r1-1776
Blog post: https://perplexity.ai/hub/blog/open-sourcing-r1-1776
  • 1 reply
ยท