Mert Erbak's picture

Mert Erbak PRO

merterbak

AI & ML interests

Currently NLP and Image Processing

Recent Activity

reacted to mmhamdy's post with πŸ”₯ about 9 hours ago
πŸŽ‰ We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information. πŸ’‘ But what makes MemoryCode unique?! The combination of the following: βœ… Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers. βœ… Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments. βœ… Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information. βœ… Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts. βœ… Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application. πŸ“Œ Our Findings 1️⃣ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions. 2️⃣ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them. πŸ”— Paper: https://huggingface.co/papers/2502.13791 πŸ“¦ Code: https://github.com/for-ai/MemoryCode
liked a Space about 11 hours ago
gradio/theme-gallery
View all activity

Organizations

MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture AI Starter Pack's profile picture

merterbak's activity

New activity in xai-org/grok-1 11 months ago

Fun Mode

1
#2 opened 11 months ago by
merterbak

Fun Mode

1
#2 opened 11 months ago by
merterbak