atsuki-yamaguchi commited on
Commit
19900b3
·
verified ·
1 Parent(s): 5329e95

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ datasets:
5
+ - allenai/MADLAD-400
6
+ language:
7
+ - gu
8
+ base_model:
9
+ - Qwen/Qwen2.5-7B
10
+ - Qwen/Qwen2.5-7B-Instruct
11
+ - atsuki-yamaguchi/Qwen2.5-7B-gu-madlad-mean-tuned
12
+ library_name: transformers
13
+ ---
14
+ # Qwen2.5 7B for Gujarati: Chat Vector
15
+
16
+ This model is built on top of Qwen2.5 7B adapted for Gujarati using 500M target language tokens sampled from MADLAD-400. It has an additional target vocabulary of 10K. Chat vector was added to the model after continual pre-training.
17
+
18
+ ## Model Details
19
+
20
+ * **Vocabulary**: This model has an additional target vocabulary of 10K.
21
+ * **Target vocabulary initialization**: The target weights of the embedding and LM head were initialized using mean initialization.
22
+ * **Training**: This model was continually pre-trained on 500M target language tokens sampled from MADLAD-400.
23
+ * **Post-processing**: The model was post-processed using the Chat Vector method.
24
+
25
+
26
+ ## Model Description
27
+
28
+ - **Language:** Gujarati
29
+ - **License:** Apache 2.0
30
+ - **Fine-tuned from model:** Qwen/Qwen2.5-7B
31
+
32
+
33
+ ## Model Sources
34
+
35
+ - **Repository:** https://github.com/gucci-j/chat-cve
36
+ - **Paper:** https://arxiv.org/abs/2412.11704
37
+
38
+
39
+ ## How to Get Started with the Model
40
+ Use the code below to get started with the model.
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModelForCausalLM
43
+
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ "atsuki-yamaguchi/Qwen2.5-7B-gu-madlad-mean-cv"
46
+ )
47
+ tokenizer = AutoTokenizer.from_pretrained(
48
+ "atsuki-yamaguchi/Qwen2.5-7B-gu-madlad-mean-cv"
49
+ )
50
+ ```
51
+
52
+
53
+ ## Citation
54
+ ```
55
+ @misc{yamaguchi2024vocabularyexpansionchatmodels,
56
+ title={{ElChat}: Adapting Chat Language Models Using Only Target Unlabeled Language Data},
57
+ author={Atsuki Yamaguchi and Terufumi Morishita and Aline Villavicencio and Nikolaos Aletras},
58
+ year={2024},
59
+ eprint={2412.11704},
60
+ archivePrefix={arXiv},
61
+ primaryClass={cs.CL},
62
+ url={https://arxiv.org/abs/2412.11704},
63
+ }
64
+ ```
65
+
66
+