Hi everyone, we’ve got big news! Starting today, all Langfuse product features are available as free OSS (MIT license).
You can now upgrade your self-hosted Langfuse to access features like: - Managed LLM-as-a-Judge evaluations - Annotation queues - Prompt experiments - LLM playground
We’re incredibly grateful for the support of this amazing community and can’t wait to hear your feedback on the new features!
In this unit, you'll learn: - Offline Evaluation – Benchmark and iterate your agent using datasets. - Online Evaluation – Continuously track key metrics such as latency, costs, and user feedback.
I've published an article showing five ways to use 🪢 Langfuse with 🤗 Hugging Face.
My personal favorite is Method #4: Using Hugging Face Datasets for Langfuse Dataset Experiments. This lets you benchmark your LLM app or AI agent with a dataset hosted on Hugging Face. In this example, I chose the GSM8K dataset (openai/gsm8k) to test the mathematical reasoning capabilities of my smolagent :)
Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:
1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents Covers agents, their functions, tool use and how they differ from models
3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more
5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty
7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge
🚀 Supercharge your LLM apps with Langfuse on Hugging Face Spaces!
Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production
Now available as a Docker Space directly on the HF Hub! 🤗
🔍 Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks 1⃣ One-click deployment: on Spaces with persistent storage and integrated OAuth 🛠 Simple Prompt Management: Version, edit, and update without redeployment ✅ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality 📊 Dataset Creation: Build datasets directly from production data to enhance future performance
Kudos to the Langfuse team for this collab and the awesome, open-first product they’re building! 👏 @marcklingen@Clemo@MJannik