anakin87 commited on
Commit
2746b0b
ยท
verified ยท
1 Parent(s): 3ca8858

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -1
README.md CHANGED
@@ -2,10 +2,91 @@
2
  license: gemma
3
  language:
4
  - it
 
5
  base_model:
6
  - VAGOsolutions/SauerkrautLM-gemma-2-9b-it
7
  pipeline_tag: text-generation
8
  library_name: transformers
 
 
 
 
 
9
  ---
10
 
11
- Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: gemma
3
  language:
4
  - it
5
+ - en
6
  base_model:
7
  - VAGOsolutions/SauerkrautLM-gemma-2-9b-it
8
  pipeline_tag: text-generation
9
  library_name: transformers
10
+ datasets:
11
+ - mii-llm/argilla-math-preferences-it
12
+ - ruggsea/wsdm2024-cot-dataset
13
+ - anakin87/evol-dpo-ita-reranked
14
+ - mlabonne/orpo-dpo-mix-40k
15
  ---
16
 
17
+ <h1>Gemma 2 9B Neogenesis ITA</h1>
18
+
19
+ <img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_9b.jpeg?raw=true" width="450px">
20
+
21
+ Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian.
22
+
23
+ - Good model with 9.24 billion parameters
24
+ - Supports 8k context length
25
+
26
+ # ๐ŸŽฎ Usage
27
+
28
+ **Text generation with Transformers**
29
+
30
+
31
+ ```python
32
+ import torch
33
+ from transformers import pipeline
34
+
35
+ model_id="anakin87/gemma-2-9b-neogenesis-ita"
36
+
37
+ pipe = pipeline(
38
+ "text-generation",
39
+ model=model_id,
40
+ model_kwargs={"torch_dtype": torch.bfloat16},
41
+ device="cuda",
42
+ )
43
+
44
+ messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
45
+ outputs = pipe(messages, max_new_tokens=500)
46
+
47
+ print(outputs[0]["generated_text"][1]["content"])
48
+ ```
49
+
50
+
51
+ # ๐Ÿ† Evaluation Results
52
+
53
+ The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models.
54
+
55
+ | Model | MMLU_IT | ARC_IT | HELLASWAG_IT | Average |
56
+ |-----------------------|---------|--------|--------------|---------|
57
+ | google/gemma-2-9b-it | 65.67 | 55.6 |68.95 | 63.41 |
58
+ | VAGOsolutions/SauerkrautLM-gemma-2-9b-it | 65.76 | **61.25** |72.10 | 66.37 |
59
+ | **anakin87/gemma-2-9b-neogenesis-ita** | **65.82** | **61.25** |**73.29** | **66.79** |
60
+
61
+ These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.
62
+
63
+
64
+ # ๐Ÿ”ง Training details
65
+
66
+ The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization.
67
+
68
+ I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers.
69
+
70
+ Batch size: 16; learning rate: 1e-6; epochs: 1.
71
+
72
+ The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).
73
+
74
+ For the training code, see the DPO section in this [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data.
75
+
76
+
77
+ # ๐Ÿ—ƒ๏ธ Training data
78
+ The model was trained primarily on Italian data, with a small portion of English data included.
79
+
80
+ For Direct Preference Optimization
81
+ - Italian data
82
+ - [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it)
83
+ - [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset)
84
+ - [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked)
85
+ - English data
86
+ - [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)
87
+
88
+ ๐Ÿ™ Thanks to the authors for providing these datasets.
89
+
90
+
91
+ # ๐Ÿ›ก๏ธ Safety
92
+ While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.