I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.
After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...
I wanted a different challenge, like ๐๐ฒ๐ฎ๐ฐ๐ต๐ถ๐ป๐ด ๐ฎ ๐บ๐ผ๐ฑ๐ฒ๐น ๐๐ผ ๐ฐ๐ฟ๐ฒ๐ฎ๐๐ฒ ๐ฎ ๐๐ฐ๐ต๐ฒ๐ฑ๐๐น๐ฒ ๐ณ๐ฟ๐ผ๐บ ๐ฎ ๐น๐ถ๐๐ ๐ผ๐ณ ๐ฒ๐๐ฒ๐ป๐๐ ๐ฎ๐ป๐ฑ ๐ฝ๐ฟ๐ถ๐ผ๐ฟ๐ถ๐๐ถ๐ฒ๐.
Choosing an original problem forced me to: ๐ค Think about the problem setting ๐งฌ Generate data ๐ค Choose the right base model ๐ Design reward functions (and experiencing reward hacking) ๐ Run multiple rounds of training, hoping that my model would learn something.
I am happy to release two new language models for the Italian Language!
๐ช Gemma 2 9B Neogenesis ITA anakin87/gemma-2-9b-neogenesis-ita Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data. Using Spectrum, I trained 20% of model layers.
๐ Evaluated on the Open ITA LLM leaderboard (mii-llm/open_ita_llm_leaderboard), this model achieves strong performance. To beat it on this benchmark, you'd need a 27B model ๐
๐ค Gemma 2 2B Neogenesis ITA anakin87/gemma-2-2b-neogenesis-ita This smaller variant is fine-tuned from the original Gemma 2 2B it by Google. Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum.
๐ Compared to the original model, it shows improved Italian proficiency, good for its small size.
Hey, it has been a while... I was busy participating in ๐ ๐๐๐ฆ๐ฆ๐ ๐๐จ๐ฆ๐ฉ๐๐ญ๐ข๐ญ๐ข๐จ๐ง!
Here's the idea: Gemma open models have a large vocabulary size (256K), so improving them for a specific language or cultural context should be pretty affordable - no need for continued pre-training.
In this notebook, I show how I improve the performance of Gemma 2 2B on Italian via Post-Training. I believe this method is adaptable to other languages and model sizes.
๐๐ฆ๐บ ๐๐ต๐ฆ๐ฑ๐ด ๐ Choose reference metrics ๐งโ๐ฌ Data curation for Instruction Fine Tuning: identify existing datasets + generate synthetic data ๐๏ธโโ๏ธ Efficient Instruction Fine Tuning with Spectrum ๐งโ๐ฌ Data curation for Preference Tuning: identify existing datasets + generate synthetic data ๐๐ Efficient Direct Preference Optimization with Spectrum ๐ Evaluation
I'm also planning a ๐ Gemma Giveaway (on LinkedIn - https://www.linkedin.com/in/stefano-fiorucci) in the next few days - sharing techniques, datasets, and models I used for my project... so stay tuned! ๐ป
Some time ago OpenAI published Swarm: an educational framework for building multi-agent systems.
Their approach focuses on two main concepts: ใป ๐๐จ๐ฎ๐ญ๐ข๐ง๐๐ฌ: Each agent follows specific ๐ instructions and uses ๐ ๏ธ tools to execute them. ใป ๐๐๐ง๐๐จ๐๐๐ฌ ๐ค: Agents can transfer control to one another using tool/function calling.
When I first read these ideas, I thought: ๐ด๐ช๐ฎ๐ฑ๐ญ๐ฆ ๐ฃ๐ถ๐ต ๐ฑ๐ฐ๐ธ๐ฆ๐ณ๐ง๐ถ๐ญ! And they pair well with the recent unified tool support in Haystack.
๐งโ๐ป So, I decided to re-implement these concepts using Haystack, and in just a few lines of code, I had a working prototype.
๐ Bonus feature: this implementation isn't tied to a single model provider - different agents can be powered by different models!
I replicated the ACME customer service example from the original article, with 3 Agents: ๐ Triage Agent - Llama 3.2 running on Ollama ๐ Sales Agent - Anthropic Claude 3.5 Sonnet ๐ Issues and Repairs Agent - OpenAI GPT-4o mini
Want to see the full implementation and give it a try? Check out the blog post and notebook! โจ
It's a recent technique for creating synthetic instruction datasets.
Magpie is based on a simple but ingenious idea ๐ if you prompt an instruction-tuned model with a pre-query template, you can make it generate a plausible user query/instruction
Here's an example: model: Llama-3-8B-Instruct pre-query template: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>" generated user instruction: "What are some of the responsibilities of a commercial pilot?"
You can then feed this instruction back into the same model to get the assistant response.
By repeating this process, it's possible to generate large synthetic datasets with relatively little effort.
๐ช The authors demonstrate that using these datasets for Supervised Fine Tuning (SFT) can yield strong performance, even competitive with the original instruct model.
Most Language Models are primarily trained on English texts, so they tend to produce data in English.
How can we overcome this?
Earlier approaches were complex or costly.
Then @mrm8488 found a simple solution: add the target language to the pre-query template. For Spanish, the template becomes "<|begin_of_text|><|start_header_id|>user<|end_header_id|>spanish:".
This method works for Spanish and German!
โ Unfortunately, it does not work well for other languages (๐ฎ๐น, ๐ณ๐ฑ, ...)
I was excited to explore Llama 3.2, but as a simple ๐ช๐บ EU guy, I don't have access to Meta's multimodal models ๐ฟ
๐ค So I thought: why not challenge the small 3B text model with Agentic RAG?
๐ฏ The plan: - Build a system that tries to answer questions using a knowledge base. - If the documents don't contain the answer, use Web search for additional context.