la-leaderboard (La Leaderboard)

hgalbraith

updated 2 datasets 26 days ago

la-leaderboard/requests

Viewer • Updated 26 days ago • 90 • 1.49k

la-leaderboard/results

Updated 26 days ago • 1.31k

albertvillanova

posted an update 27 days ago

Post

3303

Latest smolagents release supports GPT-5: build agents that think, plan, and act.
⚡ Upgrade now and put GPT-5 to work!

albertvillanova

posted an update 28 days ago

Post

463

🚀 smolagents v1.21.0 is here!
Now with improved safety in the local Python executor: dunder calls are blocked!
⚠️ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm.
✨ Many bug fixes: more reliable code.
👉 https://github.com/huggingface/smolagents/releases/tag/v1.21.0

mariagrandury

updated 2 datasets about 1 month ago

la-leaderboard/requests

Viewer • Updated 26 days ago • 90 • 1.49k

la-leaderboard/results

Updated 26 days ago • 1.31k

javicond3

authored a paper about 2 months ago

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

Paper • 2507.13302 • Published Jul 17 • 4

reviriego

authored a paper about 2 months ago

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

Paper • 2507.13302 • Published Jul 17 • 4

gonzmart

authored a paper about 2 months ago

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

Paper • 2507.13302 • Published Jul 17 • 4

albertvillanova

posted an update about 2 months ago

Post

647

🚀 New in smolagents v1.20.0: Remote Python Execution via WebAssembly (Wasm)

We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!

🔧 Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.

Why this matters:
✅ Isolated execution = no host access
✅ No need for Python on the user's machine
✅ Safer evaluation of arbitrary code
✅ Compatible with serverless / edge agent workloads
✅ Ideal for constrained or untrusted environments

This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. 💡

💡 We're inviting the open-source community to help evolve this executor:
• Tackle more advanced Python features
• Expand compatibility
• Add test coverage
• Shape the next-gen secure agent runtime

🔗 Check out the PR: https://github.com/huggingface/smolagents/pull/1261

Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.

This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together 🧠⚙️

👉 https://github.com/huggingface/smolagents/releases/tag/v1.20.0

#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI

albertvillanova

posted an update 3 months ago

Post

1710

🚀 SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:

🔧 Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency

🖥️ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.

🔄 Streaming Refactor
- Streaming event aggregation moved off the Model class
- ➡️ Better architecture & maintainability

📦 Output Tracking
- CodeAgent outputs are now stored in ActionStep
- ✅ More visibility and structure to agent decisions

🐛 Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching

📚 Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format

🔗 Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0

💬 Try it out, explore the new features, and let us know what you build!

#smolagents #opensource #AIagents #LLM #HuggingFace

albertvillanova

posted an update 3 months ago

Post

695

New in smolagents v1.17.0:
- Structured generation in CodeAgent 🧱
- Streamable HTTP MCP support 🌐
- Agent.run() returns rich RunResult 📦

Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0

ragerri

authored 3 papers 4 months ago

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques

Paper • 2407.03748 • Published Jul 4, 2024

Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

Paper • 2404.07613 • Published Apr 11, 2024

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures

Paper • 2410.05235 • Published Oct 7, 2024 • 2

mariagrandury

authored 2 papers 4 months ago

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Paper • 2504.07072 • Published Apr 9 • 9

It's the same but not the same: Do LLMs distinguish Spanish varieties?

Paper • 2504.20049 • Published Apr 8

clefourrier

posted an update 4 months ago

Post

1366

Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks

2 replies

·

albertvillanova

posted an update 4 months ago

Post

2556

New in smolagents v1.16.0:
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs

👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0

albertvillanova

posted an update 5 months ago

Post

2836

smolagents v1.14.0 is out! 🚀
🔌 MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
🪨 Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. 💼

Full release 👉 https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI

La Leaderboard

AI & ML interests

Recent Activity

la-leaderboard/requests

la-leaderboard/results

la-leaderboard/requests

la-leaderboard/results

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques

Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

It's the same but not the same: Do LLMs distinguish Spanish varieties?

AI & ML interests

Recent Activity

Team members 30

la-leaderboard's activity