andreaschari commited on
Commit
cf863fd
·
verified ·
1 Parent(s): f505b96

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -118
README.md CHANGED
@@ -1,18 +1,24 @@
1
  ---
2
- datasets: []
3
- language: []
4
  library_name: sentence-transformers
5
  pipeline_tag: sentence-similarity
6
  tags:
7
  - sentence-transformers
8
- - sentence-similarity
9
  - feature-extraction
 
 
10
  widget: []
 
 
11
  ---
12
 
13
- # SentenceTransformer
 
 
 
14
 
15
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
16
 
17
  ## Model Details
18
 
@@ -26,119 +32,13 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
26
  <!-- - **Language:** Unknown -->
27
  <!-- - **License:** Unknown -->
28
 
29
- ### Model Sources
30
-
31
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
32
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
33
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
34
-
35
- ### Full Model Architecture
36
-
37
- ```
38
- SentenceTransformer(
39
- (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
40
- (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
41
- (2): Normalize()
42
- )
43
- ```
44
-
45
- ## Usage
46
-
47
- ### Direct Usage (Sentence Transformers)
48
-
49
- First install the Sentence Transformers library:
50
-
51
- ```bash
52
- pip install -U sentence-transformers
53
- ```
54
-
55
- Then you can load this model and run inference.
56
- ```python
57
- from sentence_transformers import SentenceTransformer
58
-
59
- # Download from the 🤗 Hub
60
- model = SentenceTransformer("sentence_transformers_model_id")
61
- # Run inference
62
- sentences = [
63
- 'The weather is lovely today.',
64
- "It's so sunny outside!",
65
- 'He drove to the stadium.',
66
- ]
67
- embeddings = model.encode(sentences)
68
- print(embeddings.shape)
69
- # [3, 1024]
70
-
71
- # Get the similarity scores for the embeddings
72
- similarities = model.similarity(embeddings, embeddings)
73
- print(similarities.shape)
74
- # [3, 3]
75
- ```
76
-
77
- <!--
78
- ### Direct Usage (Transformers)
79
-
80
- <details><summary>Click to see the direct usage in Transformers</summary>
81
-
82
- </details>
83
- -->
84
-
85
- <!--
86
- ### Downstream Usage (Sentence Transformers)
87
-
88
- You can finetune this model on your own dataset.
89
-
90
- <details><summary>Click to expand</summary>
91
-
92
- </details>
93
- -->
94
-
95
- <!--
96
- ### Out-of-Scope Use
97
-
98
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
99
- -->
100
-
101
- <!--
102
- ## Bias, Risks and Limitations
103
-
104
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
105
- -->
106
-
107
- <!--
108
- ### Recommendations
109
-
110
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
111
- -->
112
-
113
  ## Training Details
114
 
115
  ### Framework Versions
116
- - Python: 3.10.14
117
- - Sentence Transformers: 3.0.1
118
- - Transformers: 4.41.2
119
- - PyTorch: 2.4.0.post301
120
- - Accelerate: 0.32.1
121
- - Datasets: 2.19.1
122
- - Tokenizers: 0.19.1
123
-
124
- ## Citation
125
-
126
- ### BibTeX
127
-
128
- <!--
129
- ## Glossary
130
-
131
- *Clearly define terms in order to be accessible across audiences.*
132
- -->
133
-
134
- <!--
135
- ## Model Card Authors
136
-
137
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
138
- -->
139
-
140
- <!--
141
- ## Model Card Contact
142
-
143
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
144
- -->
 
1
  ---
2
+ datasets:
3
+ - unicamp-dl/mmarco
4
  library_name: sentence-transformers
5
  pipeline_tag: sentence-similarity
6
  tags:
7
  - sentence-transformers
 
8
  - feature-extraction
9
+ - sentence-similarity
10
+ license: mit
11
  widget: []
12
+ base_model:
13
+ - BAAI/bge-m3
14
  ---
15
 
16
+ # BGE-m3 ZH mMARCO/v2 Transliterated Queries
17
+
18
+ This is a [BGE-M3](https://huggingface.co/BAAI/bge-m3) model post-trained on the Chinese dataset from MMARCO/v2.
19
+ The queries are transliterated Chinese to English using [uroman](https://github.com/isi-nlp/uroman).
20
 
21
+ The model was used for the SIGIR 2025 Short paper: Lost in Transliteration: Bridging the Script Gap in Neural IR.
22
 
23
  ## Model Details
24
 
 
32
  <!-- - **Language:** Unknown -->
33
  <!-- - **License:** Unknown -->
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Training Details
36
 
37
  ### Framework Versions
38
+ - Python: 3.10.13
39
+ - Sentence Transformers: 3.1.1
40
+ - Transformers: 4.45.1
41
+ - PyTorch: 2.4.1
42
+ - Accelerate: 0.34.2
43
+ - Datasets: 3.0.1
44
+ - Tokenizers: 0.20.3