Hugging Face

Enterprise
company
Verified
Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

lysandre  updated a dataset about 7 hours ago
huggingface/transformers-metadata
qubvel-hf  updated a dataset about 7 hours ago
huggingface/documentation-images
ariG23498  updated a dataset about 12 hours ago
huggingface/documentation-images
View all activity

Articles

huggingface's activity

jsulz 
posted an update about 12 hours ago
view post
Post
856
Time flies!

Six months after joining Hugging Face the Xet team is kicking off the first migrations from LFS to our storage for a number of repositories on the Hub.

More on the nitty gritty details behind the migration soon, but here are the big takeaways:

🤖 We've successfully completed the first migrations from LFS -> Xet to test the infrastructure and prepare for a wider release

✅ No action on your part needed - you can work with a Xet-backed repo like any other repo on the Hub (for now - major improvements on their way!)

👀 Keep an eye out for the Xet logo to see if a repo you know is on our infra! See the screenshots below to spot the difference 👇

⏩ ⏩ ⏩ Blazing uploads and downloads coming soon. W’re gearing up for a full integration with the Hub's Python library that will make building on the Hub faster than ever - special thanks to @celinah and @Wauplin for their assistance.

🎉 Want Early Access? If you’re curious and want to test it out the bleeding edge that will power the development experience on the Hub, we’d love to partner with you. Let me know!

This is the culmination of a lot of effort from the entire team. Big round of applause to @sirahd @brianronan @jgodlewski @hoytak @seanses @assafvayner @znation @saba9 @rajatarya @port8080 @yuchenglow
  • 1 reply
·
davanstrien 
posted an update about 22 hours ago
view post
Post
1384
Hacked together a way to log trl GRPO training completions to a 🤗 dataset repo. This allows you to:

- Track rewards from multiple reward functions
- Treat the completion and rewards from training as a "proper" dataset and do EDA
- Share results for open science

The implementation is super hacky, but I'm curious if people would find this useful.

To push completions to the Hub, you just need two extra parameters:

log_completions=True
log_completions_hub_repo='your-username/repo-name'

Example dataset: davanstrien/test-logs
Colab: https://colab.research.google.com/drive/1wzBFPVthRYYTp-mEYlznLg_e_0Za1M3g

frimelle 
posted an update 1 day ago
view post
Post
1490
What’s in a name? More than you might think, especially for AI.
Whenever I introduce myself, people often start speaking French to me, even though my French is très basic. It turns out that AI systems do something similar:
Large language models infer cultural identity from names, shaping their responses based on presumed backgrounds. But is this helpful personalization or a reinforcement of stereotypes?
In our latest paper, we explored this question by testing DeepSeek, Llama, Aya, Mistral-Nemo, and GPT-4o-mini on how they associate names with cultural identities. We analysed 900 names from 30 cultures and found strong assumptions baked into AI responses: some cultures were overrepresented, while others barely registered.
For example, a name like "Jun" often triggered Japan-related responses, while "Carlos" was linked primarily to Mexico, even though these names exist in multiple countries. Meanwhile, names from places like Ireland led to more generic answers, suggesting weaker associations in the training data.
This has real implications for AI fairness: How should AI systems personalize without stereotyping? Should they adapt at all based on a name?
Work with some of my favourite researchers: @sidicity Arnav Arora and @IAugenstein
Read the full paper here: Presumed Cultural Identity: How Names Shape LLM Responses (2502.11995)
AdinaY 
posted an update 1 day ago