Post
453
Hey, it has been a while... I was busy participating in ๐ ๐๐๐ฆ๐ฆ๐ ๐๐จ๐ฆ๐ฉ๐๐ญ๐ข๐ญ๐ข๐จ๐ง!
Here's the idea: Gemma open models have a large vocabulary size (256K), so improving them for a specific language or cultural context should be pretty affordable - no need for continued pre-training.
My submission: ๐๐๐ฎ๐น ๐๐๐จ๐ ๐๐ง๐๐ฌ๐ข๐ฌ - ๐๐จ๐ฌ๐ญ-๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐๐ฆ๐ฆ๐ ๐๐จ๐ซ ๐๐ญ๐๐ฅ๐ข๐๐ง ๐๐ง๐ ๐๐๐ฒ๐จ๐ง๐
๐ Kaggle notebook: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond
In this notebook, I show how I improve the performance of Gemma 2 2B on Italian via Post-Training.
I believe this method is adaptable to other languages and model sizes.
๐๐ฆ๐บ ๐๐ต๐ฆ๐ฑ๐ด
๐ Choose reference metrics
๐งโ๐ฌ Data curation for Instruction Fine Tuning: identify existing datasets + generate synthetic data
๐๏ธโโ๏ธ Efficient Instruction Fine Tuning with Spectrum
๐งโ๐ฌ Data curation for Preference Tuning: identify existing datasets + generate synthetic data
๐๐ Efficient Direct Preference Optimization with Spectrum
๐ Evaluation
๐ค Hugging Face collection (with models and datasets): anakin87/gemma-neogenesis-67824b7bf13ac9cfe091fe2e
I'm also planning a ๐ Gemma Giveaway (on LinkedIn - https://www.linkedin.com/in/stefano-fiorucci) in the next few days - sharing techniques, datasets, and models I used for my project... so stay tuned! ๐ป
Here's the idea: Gemma open models have a large vocabulary size (256K), so improving them for a specific language or cultural context should be pretty affordable - no need for continued pre-training.
My submission: ๐๐๐ฎ๐น ๐๐๐จ๐ ๐๐ง๐๐ฌ๐ข๐ฌ - ๐๐จ๐ฌ๐ญ-๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐๐ฆ๐ฆ๐ ๐๐จ๐ซ ๐๐ญ๐๐ฅ๐ข๐๐ง ๐๐ง๐ ๐๐๐ฒ๐จ๐ง๐
๐ Kaggle notebook: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond
In this notebook, I show how I improve the performance of Gemma 2 2B on Italian via Post-Training.
I believe this method is adaptable to other languages and model sizes.
๐๐ฆ๐บ ๐๐ต๐ฆ๐ฑ๐ด
๐ Choose reference metrics
๐งโ๐ฌ Data curation for Instruction Fine Tuning: identify existing datasets + generate synthetic data
๐๏ธโโ๏ธ Efficient Instruction Fine Tuning with Spectrum
๐งโ๐ฌ Data curation for Preference Tuning: identify existing datasets + generate synthetic data
๐๐ Efficient Direct Preference Optimization with Spectrum
๐ Evaluation
๐ค Hugging Face collection (with models and datasets): anakin87/gemma-neogenesis-67824b7bf13ac9cfe091fe2e
I'm also planning a ๐ Gemma Giveaway (on LinkedIn - https://www.linkedin.com/in/stefano-fiorucci) in the next few days - sharing techniques, datasets, and models I used for my project... so stay tuned! ๐ป