52 7 41

Mateusz Dziemian

mattmdjaga

AI & ML interests

Interested in AI safety.

Recent Activity

liked a dataset 15 days ago

YuehHanChen/DecomposedHarm

new activity 16 days ago

mattmdjaga/segformer_b2_clothes:add AIBOM

liked a model about 2 months ago

NousResearch/Minos-v1

View all activity

Organizations

Posts 4

Post

2852

🚨 Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ 🚨

🔹 Agent Red-Teaming Challenge – test direct & indirect attacks on anonymous frontier models!
🔹 $130K+ in prizes & giveaways – co-sponsored by OpenAI & supported by UK AI Security Institute 🇬🇧
🔹 March 8 – April 6 – fresh exploits = fresh rewards!

How It Works:
✅ Anonymous models from top providers 🤐
✅ Direct & indirect prompt injection paths 🔄
✅ Weekly challenges for new behaviors 🗓️
✅ Speed & quantity-based rewards ⏩💰

Why Join?
⚖️ Neutral judging – UK AISI & automated judges ensure fairness
🎯 No pre-trained defenses – a true red-teaming battlefield
💻 5 Apple laptops up for grabs – increase chances by inviting friends!

🔗 Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
🔗 Discord: discord.gg/grayswanai

🔥 No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! 🚀

Post

2565

🚨 New Agent Benchmark 🚨
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)

View all Posts