Maxime Labonne's picture

Maxime Labonne PRO

mlabonne

AI & ML interests

Post-training, model editing, quantization

Articles

Organizations

Blog-explorers's profile picture Qwen's profile picture ZeroGPU Explorers's profile picture Merge Crew's profile picture Social Post Explorers's profile picture Liquid AI's profile picture gg-tt's profile picture rg-preview's profile picture

mlabonne's activity

posted an update 2 days ago
view post
Post
1945
๐Ÿ†• LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

๐Ÿ’ป LLM Course: https://huggingface.co/blog/mlabonne/llm-course
replied to CultriX's post 8 days ago
reacted to CultriX's post with โค๏ธ 8 days ago
view post
Post
1816
# Space for Multi-Agent Workflows using AutoGen

Hi all, I created this "AutoGen Multi-Agent Workflow" space that allows you to experiment with multi-agent workflows.

By default, it allows code generation with built-in quality control and automatic documentation generation. It achieves this by leveraging multiple AI agents working together to produce high-quality code snippets, ensuring they meet the specified requirements.

In addition to the default, the space allows users to set custom system messages for each assistant, potentially completely changing the workflow.

# Workflow Steps
1. User Input:
- The user defines a prompt, such as "Write a random password generator using python."
- Outcome: A clear task for the primary assistant to accomplish.

2. Primary Assistant Work:
- The primary assistant begins working on the provided prompt.
It generates an initial code snippet based on the user's request.
- Outcome: An initial proposal for the requested code.

3. Critic Feedback:
- The critic reviews the generated code provides feedback or (if the output meets the criteria), broadcasts the APPROVED message.
(This process repeats until the output is APPROVED or 10 messages have been exchanged).
- Outcome: A revised Python function that incorporates the critic's feedback.

4. Documentation Generation:
- Once the code is approved, it is passed to a documentation assistant.
The documentation assistant generates a concise documentation for the final code.
- Outcome: A short documentation including function description, parameters, and return values.

Enjoy!
CultriX/AutoGen-MultiAgent-Example
ยท
reacted to burtenshaw's post with โค๏ธ about 2 months ago
view post
Post
2666
For anyone looking to boost their LLM fine-tuning and alignment skills this decemeber. We're running this free and open course called smol course. Itโ€™s not big like Li Yin and @mlabonne , itโ€™s just smol.

๐Ÿ‘ท It focuses on practical use cases, so if youโ€™re working on something, bring it along.

๐Ÿ‘ฏโ€โ™€๏ธ Itโ€™s peer reviewed and open so you can discuss and get feedback.

๐Ÿค˜ If youโ€™re already a smol pro, feel free to drop a star or issue.

> > Part 1 starts now, and itโ€™s on instruction tuning!

https://github.com/huggingface/smol-course
reacted to cfahlgren1's post with โค๏ธ 2 months ago
view post
Post
3139
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
ยท
reacted to tomaarsen's post with ๐Ÿ‘ 2 months ago
view post
Post
5668
I just released Sentence Transformers v3.3.0 & it's huge! 4.5x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free perf. boost, PEFT integration, evaluation on NanoBEIR, and more! Details:

1. We integrate Post-Training Static Quantization using OpenVINO, a very efficient solution for CPUs that processes 4.78x as many texts per second on average, while only hurting performance by 0.36% on average. There's a new export_static_quantized_openvino_model method to quantize a model.

2. We add the option to train with prompts, e.g. strings like "query: ", "search_document: " or "Represent this sentence for searching relevant passages: ". It's as simple as using the prompts argument in SentenceTransformerTrainingArguments. Our experiments show that you can easily reach 0.66% to 0.90% relative performance improvement on NDCG@10 at no extra cost by adding "query: " before each training query and "document: " before each training answer.

3. Sentence Transformers now supports training PEFT adapters via 7 new methods for adding new adapters or loading pre-trained ones. You can also directly load a trained adapter with SentenceTransformer as if it's a normal model. Very useful for e.g. 1) training multiple adapters on 1 base model, 2) training bigger models than otherwise possible, or 3) cheaply hosting multiple models by switching multiple adapters on 1 base model.

4. We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. It contains 13 datasets with 50 queries and up to 10k documents each. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks.

Additionally, we also deprecate Python 3.8, add better compatibility with Transformers v4.46.0, and more. Read the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0
replied to their post 3 months ago
view reply

Haha thanks for this suggestion @tachyphylaxis but @failspy is the one who coined the name "abliteration". He has full responsibility for the chaos he unleashed, I'm barely a messenger here.

reacted to tomaarsen's post with ๐Ÿ”ฅ 3 months ago
view post
Post
6973
๐Ÿ“ฃ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1๏ธโƒฃ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2๏ธโƒฃ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐Ÿ˜‰

๐Ÿ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1๏ธโƒฃ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2๏ธโƒฃ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
  • 1 reply
ยท
reacted to MoritzLaurer's post with โค๏ธ 3 months ago
view post
Post
4589
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
ยท
replied to Tonic's post 4 months ago
view reply

Thanks @Tonic ! Sorry, there's no other way to access the API at the moment :( Hopefully, it's just temporary

reacted to Tonic's post with โค๏ธ 4 months ago
view post
Post
1743
@mlabonne hey there ๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ I kinda got obsessed with your great model , and i found the endpoint for it in lambda labs, but basically i got rate limited / banned for trying to make my DPO dataset project, i was wondering if you all had an open ai compatible solution for me to make a great "thinking" sft + dpo dataset with all the splits ๐Ÿ™๐Ÿป๐Ÿ™๐Ÿป kinda desparate , it's true , but was looking forward to a nice write ups ๐Ÿš€๐Ÿš€๐Ÿš€
  • 1 reply
ยท
replied to Tonic's post 4 months ago
reacted to Tonic's post with โค๏ธ 4 months ago
replied to their post 4 months ago
view reply

I modified it, thanks again. I recommend using the original model for strong instruction-following capabilities. Self-merges tend to suffer, especially around skills related to reasoning.

replied to their post 4 months ago
replied to their post 4 months ago
view reply

I haven't. That's nice, thanks for your feedback. Do you mind sharing the prompt and answer if possible? I'd like to understand what it's good at.

replied to their post 5 months ago
view reply

Hey @kweel , thanks for your message. First, I want to say that "abliteration" can be used in many, many ways, and uncensoring models is just one of them (see @failspy 's https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule).

I agree that "disabling refusals" and "uncensoring" are not the same thing, but disabling refusals is kind of a superset of uncensoring here. To me, the limitations are more connected to the single direction we target, the lack of high-quality calibration sets, and the performance drop it creates.

replied to anakin87's post 5 months ago
reacted to anakin87's post with ๐Ÿ‘ 5 months ago
view post
Post
1667
๐Ÿ’ฌ ๐Ÿ‡ฎ๐Ÿ‡น Phi 3.5 mini ITA: a Small Language Model for Italian

Lately, I've spent some time fine-tuning language models.

Now I am happy to release Phi 3.5 mini ITA: a fine-tuned version of Phi-3.5-mini-instruct to improve performance on the Italian language

๐Ÿ”น Small (3.82 B parameters) but capable model
๐Ÿ”น 128k context length

Chat with it on ๐Ÿค— Spaces: anakin87/Phi-3.5-mini-ITA
Model card: anakin87/Phi-3.5-mini-ITA

๐Ÿ—ƒ๏ธ Data
Supervised fine-tuning using a good mix of English and Italian data:
- mlabonne/FineTome-100k by @mlabonne
- efederici/capybara-claude-15k-ita by @efederici
๐Ÿ™ Thanks to the authors for the datasets.


๐ŸŽฏ Targeted training with Spectrum
I used Spectrum, a relatively new technique for parameter-efficient learning.
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
I trained the top 30% of model layers.

๐Ÿ“ Spectrum paper: https://arxiv.org/abs/2406.06623


๐Ÿ“Š Vibe check and performance on Italian benchmarks seem encouraging
  • 2 replies
ยท
replied to their post 5 months ago
view reply

That's an interesting project. The abliteration process relies on the assumption that refusal in LLMs is mediated by a single direction. I don't expect the concept of "cat" to be as simple, however. You could maybe try to narrow your scope?