7 4 22

Sarath Shekkizhar

shekkizh

https://shekkizh.github.io/

AI & ML interests

None yet

Recent Activity

replied to their post about 2 months ago

🙋🏽‍♂️ Is your "multi agent" system really multi agentic? Or is it just a modular setup with a bunch of different prompts? 🤨 I’ve had this discussion way too often, so I finally wrote it all down. If you’re building with agents, you need to read this. Here’s the TLDR: ✅ True multi agent systems require: • Persistent, private state per agent • Memory that impacts future decisions • Adaptation based on past experiences ❌ Just having modular components, function calls, or multiple LLMs doesn't cut it. That’s not multi agentic. It’s just pipelining. 🤝 The magic is in evolving relationships, context retention, and behavioral shifts over time. 🧠 If your agents aren’t learning from each other or changing based on past experience… you are missing the point. What do you think? Curious what patterns you're experimenting with 🧐 👉 Full post: https://shekkizh.github.io/posts/2025/04/multi-agents/

replied to their post about 2 months ago

Think AGI is just around the corner? Not so fast. When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it? Spoiler: Wordle turned out to be a surprisingly effective benchmark. So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs. 🔑 Takeaways 1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents. 3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉 🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434

posted an update about 2 months ago

View all activity

Organizations

shekkizh's activity

replied to their post about 2 months ago

Didn’t you know AGI is already here 🤖

replied to their post about 2 months ago

Images are split into patches and each patch is tokenized - the tokenization is taking into a feature dimension and quantizing. This is probably already has CNN and/or attention. The issue is that of the model not able to reason both color and text in the tokenized space.

We ran about 1000 experiments - different prompting, tool call to different model for recognition, and several other techniques. The results still hold. The paper is a small part of the analysis.🤷‍♂️

posted an update about 2 months ago

Post

2451

🙋🏽‍♂️ Is your "multi agent" system really multi agentic? Or is it just a modular setup with a bunch of different prompts? 🤨

I’ve had this discussion way too often, so I finally wrote it all down. If you’re building with agents, you need to read this.

Here’s the TLDR:
✅ True multi agent systems require:
• Persistent, private state per agent
• Memory that impacts future decisions
• Adaptation based on past experiences

❌ Just having modular components, function calls, or multiple LLMs doesn't cut it. That’s not multi agentic. It’s just pipelining.

🤝 The magic is in evolving relationships, context retention, and behavioral shifts over time.
🧠 If your agents aren’t learning from each other or changing based on past experience… you are missing the point.

What do you think? Curious what patterns you're experimenting with 🧐

👉 Full post: https://shekkizh.github.io/posts/2025/04/multi-agents/

2 replies

posted an update about 2 months ago

Post

1895

Think AGI is just around the corner? Not so fast.

When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.

🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks.
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉

🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434

3 replies

posted an update 2 months ago

Post

618

Some interesting architectural choices made in Llama 4 models -- were these key to the 10M context? Possibly 🤔

🔍 Takeaways:
🧩 Interleaved Attention without position encoding
- LLaMA 4 removes explicit positional encoding in some attention layers to boost performance on longer contexts.
- The principles here could be similar to the residual connections to facilitate attention to early tokens without positional decay.

⚖️ Scaled Softmax to increase attention at inference time
- The max attention value (output of softmax) decreases as context size increases.
- Llama 4 incorporates a context-size dependent temperature in the softmax function to modify the slope of softmax, allowing the model to focus better on relevant tokens.
- Done only at inference time -- guessing it was more a choice after some observation on eval datasets.

What did you think of these choices?

updated 2 Spaces 11 months ago

TenyxChat 7B V1

🐠

TenyxChat 8x7B V1

🐋

liked a dataset 12 months ago

nvidia/HelpSteer2

Viewer • Updated Dec 18, 2024 • 21.4k • 2.14k • 420

New activity in openbmb/RLAIF-V-Dataset about 1 year ago

Dataset loading failing with HF load_dataset

#3 opened about 1 year ago by

shekkizh

liked 4 datasets about 1 year ago

reacted to their post with 🚀 about 1 year ago

Post

1228

Hi folks,
Tenyx announced its latest model Llama3-TenyxChat-70B, which outperforms a GPT-4 variant on several MT-Bench measurements.

By post-training Llama-3 70B in 15 hours, our model improves reasoning capabilities leveraging the relationship between geometry and LLM task complexity (Take a look at our paper: https://arxiv.org/abs/2312.01648, to be presented at ICML 2024)
Model: tenyx/Llama3-TenyxChat-70B, HuggingFace Space: tenyx/Llama3-TenyxChat-70B