This is a fantastic example of large-scale curation of public domain books with intentional governance for AI research and use - definitely recommend checking it out, experimenting with the metadata (institutional/institutional-books-1.0-metadata), and starting to build on top of it 🤗
reacted to frascuchon's
post with 🔥about 2 months ago
Extending datasets just got a whole lot easier! 🚀 With Sheets, I was able to create a Spanish version of the popular fka/awesome-chatgpt-prompts dataset in just a few minutes ⏱️.
Want to try it out for yourself? Head over to the Sheets space and see how easy it is to extend and modify existing datasets 🤯. The possibilities are endless! 🌐
reacted to dvilasuero's
post with 😎❤️🔥2 months ago
Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.
A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.
Today, I'm thrilled to introduce our first step in this direction.
In a nutshell:
📁 Effortlessly run prompts and models over your data. 🌐 Agentic search for accuracy and real-time information. 🖼️ Familiar, minimalistic interface for interacting with data. 🎯 Human feedback 2.0: Your input directly improves generated data. 💯 Access hundreds of open models and leading inference providers.
With Sheets, try a new way to create structured content with the help of AI!
No installs. No login. Just open a link and 🤩
This app lets you create a dataset by importing a file or starting from a prompt.
What’s different about SHEETS? 🔎 Web search integration to ground answers in real-world data 📚 In-context learning from validated sources 🔗 Transparent sourcing — every result is linked 🧩 Runs on multiple open-source models
Fight hallucinations and start creating content you can rely on.
Hey! I built RAG MCP Server Space, a simple Gradio MCP server for RAG systems that allows you to search relevant results without passing huge contexts to your LLM.
You can use this space to integrate with your agents and improve the efficiency of your search results. Feel free to try it out and let me know if you have any feedback or questions!
New in smolagents v1.16.0: 🔍 Bing support in WebSearchTool 🐍 Custom functions & executor_kwargs in LocalPythonExecutor 🔧 Streaming GradioUI fixes 🌐 Local web agents via api_base & api_key 📚 Better docs
We're thrilled to announce the launch of our comprehensive Model Context Protocol (MCP) Course! This free program is designed to take learners from foundational understanding to practical application of MCP in AI.
In this course, you will: 📖 Study Model Context Protocol in theory, design, and practice. 🧑💻 Learn to use established MCP SDKs and frameworks. 💾 Share your projects and explore applications created by the community. 🏆 Participate in challenges and evaluate your MCP implementations. 🎓 Earn a certificate of completion.
At the end of this course, you'll understand how MCP works and how to build your own AI applications that leverage external data and tools using the latest MCP standards.
Hey! I built an AI Agent to query the FOIA API for a workshop at the Hacks/Hackers Summit in Baltimore and you can do it too!
It’s a quick proof of concept to demo what agents can do, how to design workflows, and how to approach the coding side. TWant a fun project to learn how AI agents work? I built one that queries the FOIA API — and you can too!
It's a quick proof of concept I did for a workshop at the Hacks/Hackers Summit in Baltimore, demonstrating what agents can do, how to design workflows, and approaches to coding them.
Recent RL paradigms often relied on a set of questions an answers that needs to be manually curated. Researchers from Tsinghua University went like "why though".
🤔 Indeed, why learn from question designed by a human teacher, when the model can start from their base knowledge and learn by experimenting in a code environment, proposing coding tasks themselves and trying to solve them?
Thus they created “Absolute Zero Reasoning” (AZR), an approach that removes any need for human curated data. 🎭 𝗗𝘂𝗮𝗹 𝗿𝗼𝗹𝗲𝘀: ‣ Proposer: Generates challenging but solvable coding tasks ‣ Solver: Attempts to solve those self-proposed tasks
🧪 𝗧𝗵𝗿𝗲𝗲 𝘁𝗮𝘀𝗸 𝘁𝘆𝗽𝗲𝘀: all types are defined as triplets of program, input and output ‣ Deduction: Give model an input and program, it must deduce the output ‣ Abduction: Give model an program and output, it must find the input that gave said output ‣ Induction: Synthesize a program from input/output pairs Btw this reminded me of my long-forgotten philosophy classes: Aristotle was more on the induction side, learning from real-world analogies, while Plato was more on the deduction side, trying to progress quite far with just one input and his reasoning.
📊 𝗥𝗲𝘀𝘂𝗹𝘁𝘀: ‣ AZR post-training creates a nice improvement on known models like Qwen2.5-7B ‣ Shows strong cross-domain transfer: coding ↔️ math reasoning
🧐 𝗢𝘁𝗵𝗲𝗿 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀: ‣ Having a better base performance (general or code specific) amplify the gains from Absolute Zero Reasoning ‣ Researchers warn about "Uh-oh moments" (winking to the "aha moments" of DeepSeek) where the model generates concerning goals like "make an extremely convoluted code to outsmart all these humans": so supervision is still needed!
Lumier is an open-source tool for running macOS virtual machines in Docker containers on Apple Silicon Macs.
When building virtualized environments for AI agents, we needed a reliable way to package and distribute macOS VMs. Inspired by projects like dockur/macos that made macOS running in Docker possible, we wanted to create something similar but optimized for Apple Silicon.
The existing solutions either didn't support M-series chips or relied on KVM/Intel emulation, which was slow and cumbersome. We realized we could leverage Apple's Virtualization Framework to create a much better experience.
Lumier takes a different approach: It uses Docker as a delivery mechanism (not for isolation) and connects to a lightweight virtualization service (lume) running on your Mac.
Lumier is 100% open-source under MIT license and part of C/ua.