AI & ML interests

Evaluating Evaluations: We are a researcher community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.

Recent Activity

megย 
posted an update 9 days ago
megย 
posted an update 14 days ago
view post
Post
364
๐Ÿค– ICYMI: Yesterday, Hugging Face and OpenAI partnered to bring open source GPT to the public. This is a Big Deal in "AI world".

0. Common ground setting: OpenAI is the ChatGPT people. An โ€œopen sourceโ€ model is one whose weights are available โ€” that means the model can be โ€œyoursโ€.
1. You donโ€™t have to interact with the company directly, nor give them your interactions, to use the system. The company can't "surveil" you.
2. You can evaluate the unique contributions of their SOTA model much more rigorously than you can when there are collections of models+code behind a closed API. You can find out specifically what the model can and can't do.
3. And you can directly customize it for whatever you'd like. Fine-tuning, wherein you give the model data that's tailored to your use cases and train it some more on that data, is trivial* when you have the model weights.
*Provided you have the compute.
4. You can directly benchmark whatever you'd like. Biases? Energy usage? Strengths/weaknesses? Go for it. You wants it you gots it--this transparency helps people understand SOTA *in general*, not just for this model, but points to, e.g., what's going on with closed Google models as well.
5. One of the most powerful things about "openness" that I've learned is that it cultivates ecosystems of collaborators building on top of one another's brilliance to make systems that are significantly better than they would be if created in isolation.
But, caveat wrt my own philosophy...
6. I do not take it as a given that advancing LLMs is good, and have a lot more to say wrt where I think innovation should focus more. For example, a focus on *data* -- curation, measurement, consent, credit, compensation, safety -- would deeply improve technology for everyone.
7. The transparency this release provides is massive for people who want to *learn* about LLMs. For the next generation of technologists to advance over the current, they MUST be able to learn about what's happening now. (cont...)
  • 1 reply
ยท
alielfilali01ย 
posted an update 15 days ago
Tonicย 
posted an update 19 days ago
megย 
posted an update 20 days ago
view post
Post
448
๐Ÿค– ๐Ÿ‘พ Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <โ€”> China relationship in AI, and what it means for AI ethics.
yjerniteย 
posted an update 24 days ago
view post
Post
4032
๐—™๐—ถ๐—ฟ๐˜€๐˜ ๐—š๐—ฃ๐—”๐—œ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐—˜๐—จ ๐——๐—ฎ๐˜๐—ฎ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ฝ๐—ฎ๐—ฟ๐—ฒ๐—ป๐—ฐ๐˜† ๐—ง๐—ฒ๐—บ๐—ฝ๐—น๐—ฎ๐˜๐—ฒ? ๐Ÿ‡ช๐Ÿ‡บ

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! ๐Ÿ“Š๐Ÿ“š)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd ๐Ÿ‘€


In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month ๐Ÿค— ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too ๐Ÿ’ก)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations ๐Ÿ™Œ Definitely a step forward for transparency ๐Ÿ”

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency
Tonicย 
posted an update about 1 month ago
view post
Post
734
๐Ÿ‘‹ Hey there folks,

just submitted my plugin idea to the G-Assist Plugin Hackathon by @nvidia . Check it out, it's a great way to use a local SLA model on a windows machine to easily and locally get things done ! https://github.com/NVIDIA/G-Assist
Tonicย 
posted an update about 1 month ago
view post
Post
569
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

Yesterday , Nvidia released a reasoning model that beats o3 on science, math and coding !

Today you can try it out here : Tonic/Nvidia-OpenReasoning

hope you like it !
evijitย 
posted an update about 1 month ago
view post
Post
290
New blog post alert! "What is the Hugging Face Community Building?", with @yjernite and @irenesolaiman

What 1.8 Million Models Reveal About Open Source Innovation: Our latest deep dive into the Hugging Face Hub reveals patterns that challenge conventional AI narratives:

๐Ÿ”— Models become platforms for innovation Qwen, Llama, and Gemma models have spawned entire ecosystems of specialized variants. Looking at derivative works shows community adoption better than any single metric.

๐Ÿ“Š Datasets reveal the foundation layer โ†’ Most downloaded datasets are evaluation benchmarks (MMLU, Squad, GLUE) โ†’ Universities and research institutions dominate foundational data โ†’ Domain-specific datasets thrive across finance, healthcare, robotics, and science โ†’ Open actors provide the datasets that power most AI development

๐Ÿ›๏ธ Research institutions lead the charge: AI2 (Allen Institute) emerges as one of the most active contributors, alongside significant activity from IBM, NVIDIA, and international organizations. The open source ecosystem spans far beyond Big Tech.

๐Ÿ” Interactive exploration tools: We've built several tools to help you discover patterns!

ModelVerse Explorer - organizational contributions
DataVerse Explorer - dataset patterns
Organization HeatMap - activity over time
Base Model Explorer - model family trees
Semantic Search - find models by capability

๐Ÿ“š Academic research is thriving: Researchers are already producing valuable insights, including recent work at FAccT 2025: "The Brief and Wondrous Life of Open Models." We've also made hub datasets, weekly snapshots, and other data available for your own analysis.

The bottom line: AI development is far more distributed, diverse, and collaborative than popular narratives suggest. Real innovation happens through community collaboration across specialized domains.

Read: https://huggingface.co/blog/evijit/hf-hub-ecosystem-overview
evijitย 
in evaleval/Eval_Card_Form about 1 month ago

Update app.py

1
#6 opened about 2 months ago by
jebasania
Tonicย 
posted an update about 1 month ago
view post
Post
3317
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Normalize adding compute & runtime traces to your model cards
  • 2 replies
ยท
Tonicย 
posted an update about 1 month ago
view post
Post
505
Who's going to Raise Summit in Paris Tomorrow ?

If you're around , I would love to meet you :-)