Brigitte Tousignant

BrigitteTousi

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago
openai/mrcr
View all activity

Organizations

Hugging Face's profile picture Society & Ethics's profile picture HuggingFaceM4's profile picture Open-Source AI Meetup's profile picture BigCode's profile picture Hugging Face OSS Metrics's profile picture IBM-NASA Prithvi Models Family's profile picture Hugging Face Smol Models Research's profile picture Wikimedia Movement's profile picture LeRobot's profile picture Journalists on Hugging Face's profile picture Women on Hugging Face's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Hugging Face Science's profile picture Coordination Nationale pour l'IA's profile picture open/ acc's profile picture Bluesky Community's profile picture Sandbox's profile picture Open R1's profile picture

BrigitteTousi's activity

reacted to thomwolf's post with 🤗🔥🚀❤️ 2 days ago
view post
Post
3929
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
·
reacted to fdaudens's post with ❤️ 9 days ago
view post
Post
3592
I read the 456-page AI Index report so you don't have to (kidding). The wild part? While AI gets ridiculously more accessible, the power gap is actually widening:

1️⃣ The democratization of AI capabilities is accelerating rapidly:
- The gap between open and closed models is basically closed: difference in benchmarks like MMLU and HumanEval shrunk to just 1.7% in 2024
- The cost to run GPT-3.5-level performance dropped 280x in 2 years
- Model size is shrinking while maintaining performance - Phi-3-mini hitting 60%+ MMLU at fraction of parameters of early models like PaLM

2️⃣ But we're seeing concerning divides deepening:
- Geographic: US private investment ($109B) dwarfs everyone else - 12x China's $9.3B
- Research concentration: US and China dominate highly-cited papers (50 and 34 respectively in 2023), while next closest is only 7
- Gender: Major gaps in AI skill penetration rates - US shows 2.39 vs 1.71 male/female ratio

The tech is getting more accessible but the benefits aren't being distributed evenly. Worth thinking about as these tools become more central to the economy.

Give it a read - fascinating portrait of where AI is heading! https://hai-production.s3.amazonaws.com/files/hai_ai_index_report_2025.pdf
·
reacted to jsulz's post with 🔥 9 days ago
view post
Post
2024
The Llama 4 release - meta-llama/llama-4-67f0c30d9fe03840bc9d0164 - was a big one for the xet-team with every model backed by the storage infrastructure of the future for the Hub.

It's been a wild few days, and especially 🤯 to see every tensor file with a Xet logo next to it instead of LFS.

The attached graph shows requests per second to our content-addressed store (CAS) right as the release went live.

yellow = GETs; dashed line = launch time.

You can definitely tell when the community started downloading 👀

h/t to @rajatarya for the graph, the entire Xet crew to bring us to this point, and special shoutout to Rajat, @port8080 , @brianronan , @seanses , and @znation who made sure the bytes kept flying all weekend ⚡️
  • 1 reply
·
posted an update 9 days ago
view post
Post
2908
AI agents are transforming how we interact with technology, but how sustainable are they? 🌍

Design choices — like model size and structure — can massively impact energy use and cost. ⚡💰 The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.

🔑 Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. 🌱 Open-source = more efficient, eco-friendly, and accountable AI.

Read our latest, led by @sasha with assists from myself + @yjernite 🤗
https://huggingface.co/blog/sasha/ai-agent-sustainability
  • 1 reply
·
reacted to merterbak's post with 🔥 11 days ago
view post
Post
2944
Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks
reacted to AdinaY's post with 👀 16 days ago
view post
Post
2088
AutoGLM 沉思💫 FREE AI Agent released by ZhipuAI

✨ Think & Act simultaneously
✨ Based on a fully self-developed stack: GLM-4 for general, GLM-Z1 for inference, and GLM-Z1-Rumination for rumination
✨ Will openly share these models on April 14 🤯

Preview version👉 https://autoglm-research.zhipuai.cn/?channel=autoglm_android
  • 1 reply
·
reacted to fdaudens's post with ❤️ 16 days ago
view post
Post
1983
🔥 DeepSeek vibe coding with DeepSite is going viral with awesome projects!

From games to stunning visualizations, 7 wild examples:

📺 AI TV with custom channels and animations https://x.com/_akhaliq/status/1905747381951545647

🚀 Earth to Moon spacecraft journey visualization
Watch this incredible Three.js space simulation with zero external assets:
https://x.com/_akhaliq/status/1905836902533451999

💣 Minesweeper in 2.5 minutes! Built & deployed instantly on DeepSite. Zero setup needed:
https://x.com/cholf5/status/1906031928937218334

🎮 Asked for Game of Life, got a masterpiece. Simple prompt, complex features. See it in action: https://x.com/pbeyssac/status/1906304454824992844

💫 One-shot anime website with perfect UI. DeepSite turned a simple request into a fully-functional anime site: https://x.com/risphereeditor/status/1905961725028913264

📊 10-minute World Indicators Dashboard. Just described what I wanted and got a full interactive dashboard! https://x.com/i/status/1906345214089785634

✨ Ready to build without coding? Imagine it. Build it. Share it! enzostvs/deepsite
reacted to fdaudens's post with 🔥 20 days ago
view post
Post
1980
Want to ramp up your AI skills and start breaking bigger stories? With the Journalists on Hugging Face community, we're launching our first learn-together course!

We'll build AI classifiers that process months of data in minutes. How?

- Work through an interactive version of an excellent course developed by Ben Welsh and Derek Willis
- Share findings and get help in our dedicated community channel
- Build working classifiers you can use in your reporting today

No coding background needed - if you can write a ChatGPT or Claude prompt, you can do this. Journalists are already using these techniques to break stories, from uncovering hidden real estate deals to tracking unusual campaign spending.

Join us—it might give you your next big story!

Thanks to Ben and Derek for letting me adapt their excellent course into this interactive version!

- Check out the course: JournalistsonHF/first-llm-classifier

- Join our Slack community to learn together: https://docs.google.com/forms/d/e/1FAIpQLSfyA7G6Y9q-5hDBSnGc3CFtg9H8fjqKCCuieptXuTqRudGNjQ/viewform
reacted to Kseniase's post with 🔥 about 1 month ago
view post
Post
7801
15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇
  • 1 reply
·
reacted to ginipick's post with 🔥 about 1 month ago
view post
Post
4124
🌐 GraphMind: Phi-3 Instruct Graph Explorer

✨ Extract and visualize knowledge graphs from any text in multiple languages!

GraphMind is a powerful tool that leverages the capabilities of Phi-3 to transform unstructured text into structured knowledge graphs, helping you understand complex relationships within any content.

ginigen/Graph-Mind

🚀 Key Features

Multi-language Support 🌍: Process text in English, Korean, and many other languages
Instant Visualization 🧩: See extracted entities and relationships in an interactive graph
Entity Recognition 🏷️: Automatically identifies and categorizes named entities
Optimized Performance ⚡: Uses caching to deliver faster results for common examples
Intuitive Interface 👆: Simple design makes complex graph extraction accessible to everyone

💡 Use Cases

Content Analysis: Extract key entities and relationships from articles or documents
Research Assistance: Quickly visualize connections between concepts in research papers
Educational Tool: Help students understand the structure of complex texts
Multilingual Processing: Extract knowledge from content in various languages

🔧 How It Works

Enter any text in the input field
Select a model from the dropdown
Click "Extract & Visualize"
Explore the interactive knowledge graph and entity recognition results

GraphMind bridges the gap between raw text and structured knowledge, making it easier to identify patterns, extract insights, and understand relationships within any content. Try it now and transform how you interact with textual information!
#NLP #KnowledgeGraph #TextAnalysis #Visualization #Phi3 #MultilingualAI
  • 1 reply
·
replied to burtenshaw's post about 1 month ago
view reply

brb making a PR to include dog emoji reaction

reacted to burtenshaw's post with 🔥 about 1 month ago
view post
Post
1942
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft


plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly


2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)


5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)


if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

reasoning-course
  • 2 replies
·
reacted to fdaudens's post with 🔥 about 1 month ago
view post
Post
1502
Ever wanted 45 min with one of AI’s most fascinating minds? Was with @thomwolf at HumanX Vegas. Sharing my notes of his Q&A with the press—completely changed how I think about AI’s future:

1️⃣ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same."

2️⃣ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements."

3️⃣ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones."

4️⃣ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models.

We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on Hugging Face.

Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."

Love to hear your thoughts on this shift!
  • 1 reply
·
reacted to thomwolf's post with 🔥🚀 about 1 month ago
view post
Post
2817
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
reacted to clefourrier's post with 🚀 about 1 month ago
view post
Post
2200
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.