BanhMiKepThit015 commited on
Commit
3481936
·
1 Parent(s): a7d135c

Upload finetuned model with subfolders

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ unigram.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,710 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:38400
8
+ - loss:TripletLoss
9
+ - loss:ContrastiveLoss
10
+ base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
11
+ widget:
12
+ - source_sentence: Làm thế nào để cải thiện độ chính xác của mô hình tự động học khi
13
+ làm việc với dữ liệu có cấu trúc cây phức tạp, trong điều kiện tập dữ liệu huấn
14
+ luyện ban đầu bị hạn chế và cần tối ưu quá trình gán nhãn?
15
+ sentences:
16
+ - Nominal set plays a central role in a group-theoretic extension of finite automata
17
+ to those over an infinite set of data values. Moerman et al. proposed an active
18
+ learning algorithm for nominal word automata with the equality symmetry. In this
19
+ paper, we introduce deterministic bottom-up nominal tree automata (DBNTA), which
20
+ operate on trees whose nodes are labelled with elements of an orbit finite nominal
21
+ set. We then prove a Myhill-Nerode theorem for the class of languages recognized
22
+ by DBNTA and propose an active learning algorithm for DBNTA. The algorithm can
23
+ deal with any data symmetry that admits least support, not restricted to the equality
24
+ symmetry and/or the total order symmetry. To prove the termination of the algorithm,
25
+ we define a partial order on nominal sets and show that there is no infinite chain
26
+ of orbit finite nominal sets with respect to this partial order between any two
27
+ orbit finite sets.
28
+ - We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality
29
+ audio-caption pairs, designed for the evaluation of music-and-language models.
30
+ The dataset consists of 1.1k human-written natural language descriptions of 706
31
+ music recordings, all publicly accessible and released under Creative Common licenses.
32
+ To showcase the use of our dataset, we benchmark popular models on three key music-and-language
33
+ tasks (music captioning, text-to-music generation and music-language retrieval).
34
+ Our experiments highlight the importance of cross-dataset evaluation and offer
35
+ insights into how researchers can use SDD to gain a broader understanding of model
36
+ performance.
37
+ - We present a novel feasibility study on the automatic recognition of Expressed
38
+ Emotion (EE), a family environment concept based on caregivers speaking freely
39
+ about their relative/family member. We describe an automated approach for determining
40
+ the \textit{degree of warmth}, a key component of EE, from acoustic and text features
41
+ acquired from a sample of 37 recorded interviews. These recordings, collected
42
+ over 20 years ago, are derived from a nationally representative birth cohort of
43
+ 2,232 British twin children and were manually coded for EE. We outline the core
44
+ steps of extracting usable information from recordings with highly variable audio
45
+ quality and assess the efficacy of four machine learning approaches trained with
46
+ different combinations of acoustic and text features. Despite the challenges of
47
+ working with this legacy data, we demonstrated that the degree of warmth can be
48
+ predicted with an $F_{1}$-score of \textbf{61.5\%}. In this paper, we summarise
49
+ our learning and provide recommendations for future work using real-world speech
50
+ samples.
51
+ - source_sentence: Làm thế nào để quản lý khóa bảo mật một cách an toàn và tiện lợi
52
+ khi cần truy cập từ nhiều thiết bị khác nhau?
53
+ sentences:
54
+ - 'Personal cryptographic keys are the foundation of many secure services, but storing
55
+ these keys securely is a challenge, especially if they are used from multiple
56
+ devices. Storing keys in a centralized location, like an Internet-accessible server,
57
+ raises serious security concerns (e.g. server compromise). Hardware-based Trusted
58
+ Execution Environments (TEEs) are a well-known solution for protecting sensitive
59
+ data in untrusted environments, and are now becoming available on commodity server
60
+ platforms.
61
+
62
+ Although the idea of protecting keys using a server-side TEE is straight-forward,
63
+ in this paper we validate this approach and show that it enables new desirable
64
+ functionality. We describe the design, implementation, and evaluation of a TEE-based
65
+ Cloud Key Store (CKS), an online service for securely generating, storing, and
66
+ using personal cryptographic keys. Using remote attestation, users receive strong
67
+ assurance about the behaviour of the CKS, and can authenticate themselves using
68
+ passwords while avoiding typical risks of password-based authentication like password
69
+ theft or phishing. In addition, this design allows users to i) define policy-based
70
+ access controls for keys; ii) delegate keys to other CKS users for a specified
71
+ time and/or a limited number of uses; and iii) audit all key usages via a secure
72
+ audit log. We have implemented a proof of concept CKS using Intel SGX and integrated
73
+ this into GnuPG on Linux and OpenKeychain on Android. Our CKS implementation performs
74
+ approximately 6,000 signature operations per second on a single desktop PC. The
75
+ latency is in the same order of magnitude as using locally-stored keys, and 20x
76
+ faster than smart cards.'
77
+ - The problem of constrained coverage path planning involves a robot trying to cover
78
+ maximum area of an environment under some constraints that appear as obstacles
79
+ in the map. Out of the several coverage path planning methods, we consider augmenting
80
+ the linear sweep-based coverage method to achieve minimum energy/ time optimality
81
+ along with maximum area coverage. In addition, we also study the effects of variation
82
+ of different parameters on the performance of the modified method.
83
+ - 'We present a new technique for efficiently removing almost all short cycles in
84
+ a graph without unintentionally removing its triangles. Consequently, triangle
85
+ finding problems do not become easy even in almost $k$-cycle free graphs, for
86
+ any constant $k\geq 4$.
87
+
88
+ Triangle finding is at the base of many conditional lower bounds in P, mainly
89
+ for distance computation problems, and the existence of many $4$- or $5$-cycles
90
+ in a worst-case instance had been the obstacle towards resolving major open questions.
91
+
92
+ Hardness of approximation: Are there distance oracles with $m^{1+o(1)}$ preprocessing
93
+ time and $m^{o(1)}$ query time that achieve a constant approximation? Existing
94
+ algorithms with such desirable time bounds only achieve super-constant approximation
95
+ factors, while only $3-\epsilon$ factors were conditionally ruled out (Pătraşcu,
96
+ Roditty, and Thorup; FOCS 2012). We prove that no $O(1)$ approximations are possible,
97
+ assuming the $3$-SUM or APSP conjectures. In particular, we prove that $k$-approximations
98
+ require $\Omega(m^{1+1/ck})$ time, which is tight up to the constant $c$. The
99
+ lower bound holds even for the offline version where we are given the queries
100
+ in advance, and extends to other problems such as dynamic shortest paths.
101
+
102
+ The $4$-Cycle problem: An infamous open question in fine-grained complexity is
103
+ to establish any surprising consequences from a subquadratic or even linear-time
104
+ algorithm for detecting a $4$-cycle in a graph. We prove that $\Omega(m^{1.1194})$
105
+ time is needed for $k$-cycle detection for all $k\geq 4$, unless we can detect
106
+ a triangle in $\sqrt{n}$-degree graphs in $O(n^{2-\delta})$ time; a breakthrough
107
+ that is not known to follow even from optimal matrix multiplication algorithms.'
108
+ - source_sentence: Làm thế nào để đánh giá hiệu suất của các hệ thống lưu trữ dữ liệu
109
+ lớn một cách khách quan và hiệu quả?
110
+ sentences:
111
+ - Data warehouse architectural choices and optimization techniques are critical
112
+ to decision support query performance. To facilitate these choices, the performance
113
+ of the designed data warehouse must be assessed, usually with benchmarks. These
114
+ tools can either help system users comparing the performances of different systems,
115
+ or help system engineers testing the effect of various design choices. While the
116
+ Transaction Processing Performance Council's standard benchmarks address the first
117
+ point, they are not tunable enough to address the second one and fail to model
118
+ different data warehouse schemas. By contrast, our Data Warehouse Engineering
119
+ Benchmark (DWEB) allows generating various ad-hoc synthetic data warehouses and
120
+ workloads. DWEB is implemented as a Java free software that can be interfaced
121
+ with most existing relational database management systems. The full specifications
122
+ of DWEB, as well as experiments we performed to illustrate how our benchmark may
123
+ be used, are provided in this paper.
124
+ - 'Determinant maximization problem gives a general framework that models problems
125
+ arising in as diverse fields as statistics \cite{pukelsheim2006optimal}, convex
126
+ geometry \cite{Khachiyan1996}, fair allocations\linebreak \cite{anari2016nash},
127
+ combinatorics \cite{AnariGV18}, spectral graph theory \cite{nikolov2019proportional},
128
+ network design, and random processes \cite{kulesza2012determinantal}. In an instance
129
+ of a determinant maximization problem, we are given a collection of vectors $U=\{v_1,\ldots,
130
+ v_n\} \subset \RR^d$, and a goal is to pick a subset $S\subseteq U$ of given vectors
131
+ to maximize the determinant of the matrix $\sum_{i\in S} v_i v_i^\top $. Often,
132
+ the set $S$ of picked vectors must satisfy additional combinatorial constraints
133
+ such as cardinality constraint $\left(|S|\leq k\right)$ or matroid constraint
134
+ ($S$ is a basis of a matroid defined on the vectors).
135
+
136
+ In this paper, we give a polynomial-time deterministic algorithm that returns
137
+ a $r^{O(r)}$-approximation for any matroid of rank $r\leq d$. This improves previous
138
+ results that give $e^{O(r^2)}$-approximation algorithms relying on $e^{O(r)}$-approximate
139
+ \emph{estimation} algorithms \cite{NikolovS16,anari2017generalization,AnariGV18,madan2020maximizing}
140
+ for any $r\leq d$. All previous results use convex relaxations and their relationship
141
+ to stable polynomials and strongly log-concave polynomials. In contrast, our algorithm
142
+ builds on combinatorial algorithms for matroid intersection, which iteratively
143
+ improve any solution by finding an \emph{alternating negative cycle} in the \emph{exchange
144
+ graph} defined by the matroids. While the $\det(.)$ function is not linear, we
145
+ show that taking appropriate linear approximations at each iteration suffice to
146
+ give the improved approximation algorithm.'
147
+ - Generating value from data requires the ability to find, access and make sense
148
+ of datasets. There are many efforts underway to encourage data sharing and reuse,
149
+ from scientific publishers asking authors to submit data alongside manuscripts
150
+ to data marketplaces, open data portals and data communities. Google recently
151
+ beta released a search service for datasets, which allows users to discover data
152
+ stored in various online repositories via keyword queries. These developments
153
+ foreshadow an emerging research field around dataset search or retrieval that
154
+ broadly encompasses frameworks, methods and tools that help match a user data
155
+ need against a collection of datasets. Here, we survey the state of the art of
156
+ research and commercial systems in dataset retrieval. We identify what makes dataset
157
+ search a research field in its own right, with unique challenges and methods and
158
+ highlight open problems. We look at approaches and implementations from related
159
+ areas dataset search is drawing upon, including information retrieval, databases,
160
+ entity-centric and tabular search in order to identify possible paths to resolve
161
+ these open problems as well as immediate next steps that will take the field forward.
162
+ - source_sentence: Làm thế nào để cải thiện độ chính xác của hệ thống dịch tự động
163
+ từ giọng nói sang văn bản mà không cần phụ thuộc vào dữ liệu văn bản gốc?
164
+ sentences:
165
+ - 'Intelligent voice assistants, such as Apple Siri and Amazon Alexa, are widely
166
+ used nowadays. These task-oriented dialogue systems require a semantic parsing
167
+ module in order to process user utterances and understand the action to be performed.
168
+ This semantic parsing component was initially implemented by rule-based or statistical
169
+ slot-filling approaches for processing simple queries; however, the appearance
170
+ of more complex utterances demanded the application of shift-reduce parsers or
171
+ sequence-to-sequence models. Although shift-reduce approaches were initially considered
172
+ the most promising option, the emergence of sequence-to-sequence neural systems
173
+ has propelled them to the forefront as the highest-performing method for this
174
+ particular task. In this article, we advance the research on shift-reduce semantic
175
+ parsing for task-oriented dialogue. We implement novel shift-reduce parsers that
176
+ rely on Stack-Transformers. This framework allows to adequately model transition
177
+ systems on the Transformer neural architecture, notably boosting shift-reduce
178
+ parsing performance. Furthermore, our approach goes beyond the conventional top-down
179
+ algorithm: we incorporate alternative bottom-up and in-order transition systems
180
+ derived from constituency parsing into the realm of task-oriented parsing. We
181
+ extensively test our approach on multiple domains from the Facebook TOP benchmark,
182
+ improving over existing shift-reduce parsers and state-of-the-art sequence-to-sequence
183
+ models in both high-resource and low-resource settings. We also empirically prove
184
+ that the in-order algorithm substantially outperforms the commonly-used top-down
185
+ strategy. Through the creation of innovative transition systems and harnessing
186
+ the capabilities of a robust neural architecture, our study showcases the superiority
187
+ of shift-reduce parsers over leading sequence-to-sequence methods on the main
188
+ benchmark.'
189
+ - We investigate end-to-end speech-to-text translation on a corpus of audiobooks
190
+ specifically augmented for this task. Previous works investigated the extreme
191
+ case where source language transcription is not available during learning nor
192
+ decoding, but we also study a midway case where source language transcription
193
+ is available at training time only. In this case, a single model is trained to
194
+ decode source speech into target text in a single pass. Experimental results show
195
+ that it is possible to train compact and efficient end-to-end speech translation
196
+ models in this setup. We also distribute the corpus and hope that our speech translation
197
+ baseline on this corpus will be challenged in the future.
198
+ - Advanced Persistent Threats (APTs) are a main impendence in cyber security of
199
+ computer networks. In 2015, a successful breach remains undetected 146 days on
200
+ average, reported by [Fi16].With our work we demonstrate a feasible and fast way
201
+ to analyse real world log data to detect breaches or breach attempts. By adapting
202
+ well-known kill chain mechanisms and a combine of a time series database and an
203
+ abstracted graph approach, it is possible to create flexible attack profiles.
204
+ Using this approach, it can be demonstrated that the graph analysis successfully
205
+ detects simulated attacks by analysing the log data of a simulated computer network.
206
+ Considering another source for log data, the framework is capable to deliver sufficient
207
+ performance for analysing real-world data in short time. By using the computing
208
+ power of the graph database it is possible to identify the attacker and furthermore
209
+ it is feasible to detect other affected system components. We believe to significantly
210
+ reduce the detection time of breaches with this approach and react fast to new
211
+ attack vectors.
212
+ - source_sentence: Làm thế nào để nhận biết và theo dõi các khái niệm mới xuất hiện
213
+ trong dữ liệu văn bản trước khi chúng trở thành kiến thức phổ biến?
214
+ sentences:
215
+ - 'The outbreak of COVID-19 pandemic has exposed an urgent need for effective contact
216
+ tracing solutions through mobile phone applications to prevent the infection from
217
+ spreading further. However, due to the nature of contact tracing, public concern
218
+ on privacy issues has been a bottleneck to the existing solutions, which is significantly
219
+ affecting the uptake of contact tracing applications across the globe. In this
220
+ paper, we present a blockchain-enabled privacy-preserving contact tracing scheme:
221
+ BeepTrace, where we propose to adopt blockchain bridging the user/patient and
222
+ the authorized solvers to desensitize the user ID and location information. Compared
223
+ with recently proposed contract tracing solutions, our approach shows higher security
224
+ and privacy with the additional advantages of being battery friendly and globally
225
+ accessible. Results show viability in terms of the required resource at both server
226
+ and mobile phone perspectives. Through breaking the privacy concerns of the public,
227
+ the proposed BeepTrace solution can provide a timely framework for authorities,
228
+ companies, software developers and researchers to fast develop and deploy effective
229
+ digital contact tracing applications, to conquer COVID-19 pandemic soon. Meanwhile,
230
+ the open initiative of BeepTrace allows worldwide collaborations, integrate existing
231
+ tracing and positioning solutions with the help of blockchain technology.'
232
+ - Digital holography is a 3D imaging technique by emitting a laser beam with a plane
233
+ wavefront to an object and measuring the intensity of the diffracted waveform,
234
+ called holograms. The object's 3D shape can be obtained by numerical analysis
235
+ of the captured holograms and recovering the incurred phase. Recently, deep learning
236
+ (DL) methods have been used for more accurate holographic processing. However,
237
+ most supervised methods require large datasets to train the model, which is rarely
238
+ available in most DH applications due to the scarcity of samples or privacy concerns.
239
+ A few one-shot DL-based recovery methods exist with no reliance on large datasets
240
+ of paired images. Still, most of these methods often neglect the underlying physics
241
+ law that governs wave propagation. These methods offer a black-box operation,
242
+ which is not explainable, generalizable, and transferrable to other samples and
243
+ applications. In this work, we propose a new DL architecture based on generative
244
+ adversarial networks that uses a discriminative network for realizing a semantic
245
+ measure for reconstruction quality while using a generative network as a function
246
+ approximator to model the inverse of hologram formation. We impose smoothness
247
+ on the background part of the recovered image using a progressive masking module
248
+ powered by simulated annealing to enhance the reconstruction quality. The proposed
249
+ method is one of its kind that exhibits high transferability to similar samples,
250
+ which facilitates its fast deployment in time-sensitive applications without the
251
+ need for retraining the network. The results show a considerable improvement to
252
+ competitor methods in reconstruction quality (about 5 dB PSNR gain) and robustness
253
+ to noise (about 50% reduction in PSNR vs noise increase rate).
254
+ - 'We study how collective memories are formed online. We do so by tracking entities
255
+ that emerge in public discourse, that is, in online text streams such as social
256
+ media and news streams, before they are incorporated into Wikipedia, which, we
257
+ argue, can be viewed as an online place for collective memory. By tracking how
258
+ entities emerge in public discourse, i.e., the temporal patterns between their
259
+ first mention in online text streams and subsequent incorporation into collective
260
+ memory, we gain insights into how the collective remembrance process happens online.
261
+ Specifically, we analyze nearly 80,000 entities as they emerge in online text
262
+ streams before they are incorporated into Wikipedia. The online text streams we
263
+ use for our analysis comprise of social media and news streams, and span over
264
+ 579 million documents in a timespan of 18 months. We discover two main emergence
265
+ patterns: entities that emerge in a "bursty" fashion, i.e., that appear in public
266
+ discourse without a precedent, blast into activity and transition into collective
267
+ memory. Other entities display a "delayed" pattern, where they appear in public
268
+ discourse, experience a period of inactivity, and then resurface before transitioning
269
+ into our cultural collective memory.'
270
+ pipeline_tag: sentence-similarity
271
+ library_name: sentence-transformers
272
+ metrics:
273
+ - cosine_accuracy
274
+ - cosine_accuracy_threshold
275
+ - cosine_f1
276
+ - cosine_f1_threshold
277
+ - cosine_precision
278
+ - cosine_recall
279
+ - cosine_ap
280
+ - cosine_mcc
281
+ model-index:
282
+ - name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
283
+ results:
284
+ - task:
285
+ type: triplet
286
+ name: Triplet
287
+ dataset:
288
+ name: triplet eval
289
+ type: triplet_eval
290
+ metrics:
291
+ - type: cosine_accuracy
292
+ value: 0.9681249856948853
293
+ name: Cosine Accuracy
294
+ - task:
295
+ type: binary-classification
296
+ name: Binary Classification
297
+ dataset:
298
+ name: binary eval
299
+ type: binary_eval
300
+ metrics:
301
+ - type: cosine_accuracy
302
+ value: 0.90546875
303
+ name: Cosine Accuracy
304
+ - type: cosine_accuracy_threshold
305
+ value: 0.4351062774658203
306
+ name: Cosine Accuracy Threshold
307
+ - type: cosine_f1
308
+ value: 0.9093632958801497
309
+ name: Cosine F1
310
+ - type: cosine_f1_threshold
311
+ value: 0.4334937036037445
312
+ name: Cosine F1 Threshold
313
+ - type: cosine_precision
314
+ value: 0.8989928909952607
315
+ name: Cosine Precision
316
+ - type: cosine_recall
317
+ value: 0.9199757502273416
318
+ name: Cosine Recall
319
+ - type: cosine_ap
320
+ value: 0.9677234802511296
321
+ name: Cosine Ap
322
+ - type: cosine_mcc
323
+ value: 0.8108508280342461
324
+ name: Cosine Mcc
325
+ ---
326
+
327
+ # SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
328
+
329
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
330
+
331
+ ## Model Details
332
+
333
+ ### Model Description
334
+ - **Model Type:** Sentence Transformer
335
+ - **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) <!-- at revision 86741b4e3f5cb7765a600d3a3d55a0f6a6cb443d -->
336
+ - **Maximum Sequence Length:** 128 tokens
337
+ - **Output Dimensionality:** 384 dimensions
338
+ - **Similarity Function:** Cosine Similarity
339
+ <!-- - **Training Dataset:** Unknown -->
340
+ <!-- - **Language:** Unknown -->
341
+ <!-- - **License:** Unknown -->
342
+
343
+ ### Model Sources
344
+
345
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
346
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
347
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
348
+
349
+ ### Full Model Architecture
350
+
351
+ ```
352
+ SentenceTransformer(
353
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
354
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
355
+ )
356
+ ```
357
+
358
+ ## Usage
359
+
360
+ ### Direct Usage (Sentence Transformers)
361
+
362
+ First install the Sentence Transformers library:
363
+
364
+ ```bash
365
+ pip install -U sentence-transformers
366
+ ```
367
+
368
+ Then you can load this model and run inference.
369
+ ```python
370
+ from sentence_transformers import SentenceTransformer
371
+
372
+ # Download from the 🤗 Hub
373
+ model = SentenceTransformer("sentence_transformers_model_id")
374
+ # Run inference
375
+ sentences = [
376
+ 'Làm thế nào để nhận biết và theo dõi các khái niệm mới xuất hiện trong dữ liệu văn bản trước khi chúng trở thành kiến thức phổ biến?',
377
+ 'We study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, i.e., the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a timespan of 18 months. We discover two main emergence patterns: entities that emerge in a "bursty" fashion, i.e., that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a "delayed" pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory.',
378
+ "Digital holography is a 3D imaging technique by emitting a laser beam with a plane wavefront to an object and measuring the intensity of the diffracted waveform, called holograms. The object's 3D shape can be obtained by numerical analysis of the captured holograms and recovering the incurred phase. Recently, deep learning (DL) methods have been used for more accurate holographic processing. However, most supervised methods require large datasets to train the model, which is rarely available in most DH applications due to the scarcity of samples or privacy concerns. A few one-shot DL-based recovery methods exist with no reliance on large datasets of paired images. Still, most of these methods often neglect the underlying physics law that governs wave propagation. These methods offer a black-box operation, which is not explainable, generalizable, and transferrable to other samples and applications. In this work, we propose a new DL architecture based on generative adversarial networks that uses a discriminative network for realizing a semantic measure for reconstruction quality while using a generative network as a function approximator to model the inverse of hologram formation. We impose smoothness on the background part of the recovered image using a progressive masking module powered by simulated annealing to enhance the reconstruction quality. The proposed method is one of its kind that exhibits high transferability to similar samples, which facilitates its fast deployment in time-sensitive applications without the need for retraining the network. The results show a considerable improvement to competitor methods in reconstruction quality (about 5 dB PSNR gain) and robustness to noise (about 50% reduction in PSNR vs noise increase rate).",
379
+ ]
380
+ embeddings = model.encode(sentences)
381
+ print(embeddings.shape)
382
+ # [3, 384]
383
+
384
+ # Get the similarity scores for the embeddings
385
+ similarities = model.similarity(embeddings, embeddings)
386
+ print(similarities.shape)
387
+ # [3, 3]
388
+ ```
389
+
390
+ <!--
391
+ ### Direct Usage (Transformers)
392
+
393
+ <details><summary>Click to see the direct usage in Transformers</summary>
394
+
395
+ </details>
396
+ -->
397
+
398
+ <!--
399
+ ### Downstream Usage (Sentence Transformers)
400
+
401
+ You can finetune this model on your own dataset.
402
+
403
+ <details><summary>Click to expand</summary>
404
+
405
+ </details>
406
+ -->
407
+
408
+ <!--
409
+ ### Out-of-Scope Use
410
+
411
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
412
+ -->
413
+
414
+ ## Evaluation
415
+
416
+ ### Metrics
417
+
418
+ #### Triplet
419
+
420
+ * Dataset: `triplet_eval`
421
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
422
+
423
+ | Metric | Value |
424
+ |:--------------------|:-----------|
425
+ | **cosine_accuracy** | **0.9681** |
426
+
427
+ #### Binary Classification
428
+
429
+ * Dataset: `binary_eval`
430
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
431
+
432
+ | Metric | Value |
433
+ |:--------------------------|:-----------|
434
+ | cosine_accuracy | 0.9055 |
435
+ | cosine_accuracy_threshold | 0.4351 |
436
+ | cosine_f1 | 0.9094 |
437
+ | cosine_f1_threshold | 0.4335 |
438
+ | cosine_precision | 0.899 |
439
+ | cosine_recall | 0.92 |
440
+ | **cosine_ap** | **0.9677** |
441
+ | cosine_mcc | 0.8109 |
442
+
443
+ <!--
444
+ ## Bias, Risks and Limitations
445
+
446
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
447
+ -->
448
+
449
+ <!--
450
+ ### Recommendations
451
+
452
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
453
+ -->
454
+
455
+ ## Training Details
456
+
457
+ ### Training Datasets
458
+
459
+ #### Unnamed Dataset
460
+
461
+ * Size: 12,800 training samples
462
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
463
+ * Approximate statistics based on the first 1000 samples:
464
+ | | sentence_0 | sentence_1 | sentence_2 |
465
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
466
+ | type | string | string | string |
467
+ | details | <ul><li>min: 21 tokens</li><li>mean: 39.66 tokens</li><li>max: 79 tokens</li></ul> | <ul><li>min: 37 tokens</li><li>mean: 126.26 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 40 tokens</li><li>mean: 125.44 tokens</li><li>max: 128 tokens</li></ul> |
468
+ * Samples:
469
+ | sentence_0 | sentence_1 | sentence_2 |
470
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
471
+ | <code>Làm thế nào để đánh giá hiệu quả của các thuật toán nâng cao chất lượng ảnh trong điều kiện thực tế?</code> | <code>Over the past decades, various super-resolution (SR) techniques have been developed to enhance the spatial resolution of digital images. Despite the great number of methodical contributions, there is still a lack of comparative validations of SR under practical conditions, as capturing real ground truth data is a challenging task. Therefore, current studies are either evaluated 1) on simulated data or 2) on real data without a pixel-wise ground truth.<br>To facilitate comprehensive studies, this paper introduces the publicly available Super-Resolution Erlangen (SupER) database that includes real low-resolution images along with high-resolution ground truth data. Our database comprises image sequences with more than 20k images captured from 14 scenes under various types of motions and photometric conditions. The datasets cover four spatial resolution levels using camera hardware binning. With this database, we benchmark 15 single-image and multi-frame SR algorithms. Our experiments quantit...</code> | <code>Deep learning is ubiquitous across many areas areas of computer vision. It often requires large scale datasets for training before being fine-tuned on small-to-medium scale problems. Activity, or, in other words, action recognition, is one of many application areas of deep learning. While there exist many Convolutional Neural Network architectures that work with the RGB and optical flow frames, training on the time sequences of 3D body skeleton joints is often performed via recurrent networks such as LSTM.<br>In this paper, we propose a new representation which encodes sequences of 3D body skeleton joints in texture-like representations derived from mathematically rigorous kernel methods. Such a representation becomes the first layer in a standard CNN network e.g., ResNet-50, which is then used in the supervised domain adaptation pipeline to transfer information from the source to target dataset. This lets us leverage the available Kinect-based data beyond training on a single dataset and...</code> |
472
+ | <code>Làm thế nào để xử lý lượng lớn dữ liệu thực thể một cách hiệu quả mà vẫn đảm bảo độ chính xác khi thời gian và tài nguyên tính toán bị giới hạn Trong điều kiện phải làm việc với các nguồn dữ liệu không đồng nhất, liệu có phương pháp nào có thể áp dụng lược đồ linh hoạt để cải thiện hiệu suất?</code> | <code>Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In practice, its goal is to provide the best possible partial solution by approximating the optimal comparison order of the entity profiles. So far, Progressive ER has only been examined in the context of structured (relational) data sources, as the existing methods rely on schema knowledge to save unnecessary comparisons: they restrict their search space to similar entities with the help of schema-based blocking keys (i.e., signatures that represent the entity profiles). As a result, these solutions are not applicable in Big Data integration applications, which involve large and heterogeneous datasets, such as relational and RDF databases, JSON files, Web corpus etc. To cover this gap, we propose a family of schema-agnostic Progressive ER methods, which do n...</code> | <code>Pattern matching on large graphs is the foundation for a variety of application domains. Strict latency requirements and continuously increasing graph sizes demand the usage of highly parallel in-memory graph processing engines that need to consider non-uniform memory access (NUMA) and concurrency issues to scale up on modern multiprocessor systems. To tackle these aspects, graph partitioning becomes increasingly important. Hence, we present a technique to process graph pattern matching on NUMA systems in this paper. As a scalable pattern matching processing infrastructure, we leverage a data-oriented architecture that preserves data locality and minimizes concurrency-related bottlenecks on NUMA systems. We show in detail, how graph pattern matching can be asynchronously processed on a multiprocessor system.</code> |
473
+ | <code>Làm thế nào để tối ưu hóa việc truyền dữ liệu từ các thiết bị theo dõi động vật trong điều kiện băng thông hạn chế mà vẫn đảm bảo độ chính xác của thông tin sinh thái học?</code> | <code>Bio-loggers, electronic devices used to track animal behaviour through various sensors, have become essential in wildlife research.<br>Despite continuous improvements in their capabilities, bio-loggers still face significant limitations in storage, processing, and data transmission due to the constraints of size and weight, which are necessary to avoid disturbing the animals.<br>This study aims to explore how selective data transmission, guided by machine learning, can reduce the energy consumption of bio-loggers, thereby extending their operational lifespan without requiring hardware modifications.</code> | <code>T-distributed stochastic neighbor embedding (tSNE) is a popular and prize-winning approach for dimensionality reduction and visualizing high-dimensional data. However, tSNE is non-parametric: once visualization is built, tSNE is not designed to incorporate additional data into existing representation. It highly limits the applicability of tSNE to the scenarios where data are added or updated over time (like dashboards or series of data snapshots).<br>In this paper we propose, analyze and evaluate LION-tSNE (Local Interpolation with Outlier coNtrol) - a novel approach for incorporating new data into tSNE representation. LION-tSNE is based on local interpolation in the vicinity of training data, outlier detection and a special outlier mapping algorithm. We show that LION-tSNE method is robust both to outliers and to new samples from existing clusters. We also discuss multiple possible improvements for special cases.<br>We compare LION-tSNE to a comprehensive list of possible benchmark approach...</code> |
474
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
475
+ ```json
476
+ {
477
+ "distance_metric": "TripletDistanceMetric.COSINE",
478
+ "triplet_margin": 0.5
479
+ }
480
+ ```
481
+
482
+ #### Unnamed Dataset
483
+
484
+ * Size: 25,600 training samples
485
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
486
+ * Approximate statistics based on the first 1000 samples:
487
+ | | sentence_0 | sentence_1 | label |
488
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:---------------------------------------------------------------|
489
+ | type | string | string | float |
490
+ | details | <ul><li>min: 21 tokens</li><li>mean: 40.09 tokens</li><li>max: 80 tokens</li></ul> | <ul><li>min: 28 tokens</li><li>mean: 126.39 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
491
+ * Samples:
492
+ | sentence_0 | sentence_1 | label |
493
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
494
+ | <code>Làm thế nào để tối ưu hóa thuật toán tìm đường đi ngắn nhất trong đồ thị có trọng số khi thời gian tính toán trọng số cạnh là yếu tố quan trọng cần xem xét?</code> | <code>The shortest path problem in graphs is fundamental to AI. Nearly all variants of the problem and relevant algorithms that solve them ignore edge-weight computation time and its common relation to weight uncertainty. This implies that taking these factors into consideration can potentially lead to a performance boost in relevant applications. Recently, a generalized framework for weighted directed graphs was suggested, where edge-weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. We build on this framework to introduce the problem of finding the tightest admissible shortest path (TASP); a path with the tightest suboptimality bound on the optimal cost. This is a generalization of the shortest path problem to bounded uncertainty, where edge-weight uncertainty can be traded for computational cost. We present a complete algorithm for solving TASP, with guarantees on solution quality. Empirical evaluation supports the effectiveness of this approac...</code> | <code>1.0</code> |
495
+ | <code>Làm thế nào để thiết kế bộ nhân xấp xỉ tiết kiệm năng lượng cho các thiết bị IoT khi yêu cầu độ chính xác có thể linh hoạt và tài nguyên phần cứng bị giới hạn?</code> | <code>Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially in error-resilient applications. The computation error and energy efficiency largely depend on how and where the approximation is introduced into a design. Thus, this article aims to provide a comprehensive review of the approximation techniques in multiplier designs ranging from algorithms and architectures to circuits. We have implemented representative approximate multiplier designs in each category to understand the impact of the design techniques on accuracy and efficiency. The designs can then be effectively deployed in high-level applications, such as machine learning, to gain energy efficiency at the cost of slight accuracy loss.</code> | <code>1.0</code> |
496
+ | <code>Làm thế nào để cải thiện tính tự nhiên trong giọng nói tổng hợp khi hệ thống hiện tại thường tạo ra ngữ điệu đơn điệu do phụ thuộc vào dữ liệu huấn luyện trung bình?</code> | <code>This work presents a SystemC-TLM based simulator for a RISC-V microcontroller. This simulator is focused on simplicity and easy expandable of a RISC-V. It is built around a full RISC-V instruction set simulator that supports full RISC-V ISA and extensions M, A, C, Zicsr and Zifencei. The ISS is encapsulated in a TLM-2 wrapper that enables it to communicate with any other TLM-2 compatible module. The simulator also includes a very basic set of peripherals to enable a complete SoC simulator. The running code can be compiled with standard tools and using standard C libraries without modifications. The simulator is able to correctly execute the riscv-compliance suite. The entire simulator is published as a docker image to ease its installation and use by developers. A porting of FreeRTOSv10.2.1 for the simulated SoC is also published.</code> | <code>0.0</code> |
497
+ * Loss: [<code>ContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
498
+ ```json
499
+ {
500
+ "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
501
+ "margin": 0.5,
502
+ "size_average": true
503
+ }
504
+ ```
505
+
506
+ ### Training Hyperparameters
507
+ #### Non-Default Hyperparameters
508
+
509
+ - `eval_strategy`: steps
510
+ - `per_device_train_batch_size`: 64
511
+ - `per_device_eval_batch_size`: 64
512
+ - `num_train_epochs`: 1
513
+ - `multi_dataset_batch_sampler`: round_robin
514
+
515
+ #### All Hyperparameters
516
+ <details><summary>Click to expand</summary>
517
+
518
+ - `overwrite_output_dir`: False
519
+ - `do_predict`: False
520
+ - `eval_strategy`: steps
521
+ - `prediction_loss_only`: True
522
+ - `per_device_train_batch_size`: 64
523
+ - `per_device_eval_batch_size`: 64
524
+ - `per_gpu_train_batch_size`: None
525
+ - `per_gpu_eval_batch_size`: None
526
+ - `gradient_accumulation_steps`: 1
527
+ - `eval_accumulation_steps`: None
528
+ - `torch_empty_cache_steps`: None
529
+ - `learning_rate`: 5e-05
530
+ - `weight_decay`: 0.0
531
+ - `adam_beta1`: 0.9
532
+ - `adam_beta2`: 0.999
533
+ - `adam_epsilon`: 1e-08
534
+ - `max_grad_norm`: 1
535
+ - `num_train_epochs`: 1
536
+ - `max_steps`: -1
537
+ - `lr_scheduler_type`: linear
538
+ - `lr_scheduler_kwargs`: {}
539
+ - `warmup_ratio`: 0.0
540
+ - `warmup_steps`: 0
541
+ - `log_level`: passive
542
+ - `log_level_replica`: warning
543
+ - `log_on_each_node`: True
544
+ - `logging_nan_inf_filter`: True
545
+ - `save_safetensors`: True
546
+ - `save_on_each_node`: False
547
+ - `save_only_model`: False
548
+ - `restore_callback_states_from_checkpoint`: False
549
+ - `no_cuda`: False
550
+ - `use_cpu`: False
551
+ - `use_mps_device`: False
552
+ - `seed`: 42
553
+ - `data_seed`: None
554
+ - `jit_mode_eval`: False
555
+ - `use_ipex`: False
556
+ - `bf16`: False
557
+ - `fp16`: False
558
+ - `fp16_opt_level`: O1
559
+ - `half_precision_backend`: auto
560
+ - `bf16_full_eval`: False
561
+ - `fp16_full_eval`: False
562
+ - `tf32`: None
563
+ - `local_rank`: 0
564
+ - `ddp_backend`: None
565
+ - `tpu_num_cores`: None
566
+ - `tpu_metrics_debug`: False
567
+ - `debug`: []
568
+ - `dataloader_drop_last`: False
569
+ - `dataloader_num_workers`: 0
570
+ - `dataloader_prefetch_factor`: None
571
+ - `past_index`: -1
572
+ - `disable_tqdm`: False
573
+ - `remove_unused_columns`: True
574
+ - `label_names`: None
575
+ - `load_best_model_at_end`: False
576
+ - `ignore_data_skip`: False
577
+ - `fsdp`: []
578
+ - `fsdp_min_num_params`: 0
579
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
580
+ - `tp_size`: 0
581
+ - `fsdp_transformer_layer_cls_to_wrap`: None
582
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
583
+ - `deepspeed`: None
584
+ - `label_smoothing_factor`: 0.0
585
+ - `optim`: adamw_torch
586
+ - `optim_args`: None
587
+ - `adafactor`: False
588
+ - `group_by_length`: False
589
+ - `length_column_name`: length
590
+ - `ddp_find_unused_parameters`: None
591
+ - `ddp_bucket_cap_mb`: None
592
+ - `ddp_broadcast_buffers`: False
593
+ - `dataloader_pin_memory`: True
594
+ - `dataloader_persistent_workers`: False
595
+ - `skip_memory_metrics`: True
596
+ - `use_legacy_prediction_loop`: False
597
+ - `push_to_hub`: False
598
+ - `resume_from_checkpoint`: None
599
+ - `hub_model_id`: None
600
+ - `hub_strategy`: every_save
601
+ - `hub_private_repo`: None
602
+ - `hub_always_push`: False
603
+ - `gradient_checkpointing`: False
604
+ - `gradient_checkpointing_kwargs`: None
605
+ - `include_inputs_for_metrics`: False
606
+ - `include_for_metrics`: []
607
+ - `eval_do_concat_batches`: True
608
+ - `fp16_backend`: auto
609
+ - `push_to_hub_model_id`: None
610
+ - `push_to_hub_organization`: None
611
+ - `mp_parameters`:
612
+ - `auto_find_batch_size`: False
613
+ - `full_determinism`: False
614
+ - `torchdynamo`: None
615
+ - `ray_scope`: last
616
+ - `ddp_timeout`: 1800
617
+ - `torch_compile`: False
618
+ - `torch_compile_backend`: None
619
+ - `torch_compile_mode`: None
620
+ - `include_tokens_per_second`: False
621
+ - `include_num_input_tokens_seen`: False
622
+ - `neftune_noise_alpha`: None
623
+ - `optim_target_modules`: None
624
+ - `batch_eval_metrics`: False
625
+ - `eval_on_start`: False
626
+ - `use_liger_kernel`: False
627
+ - `eval_use_gather_object`: False
628
+ - `average_tokens_across_devices`: False
629
+ - `prompts`: None
630
+ - `batch_sampler`: batch_sampler
631
+ - `multi_dataset_batch_sampler`: round_robin
632
+
633
+ </details>
634
+
635
+ ### Training Logs
636
+ | Epoch | Step | triplet_eval_cosine_accuracy | binary_eval_cosine_ap |
637
+ |:-----:|:----:|:----------------------------:|:---------------------:|
638
+ | 0.5 | 100 | 0.9609 | 0.9544 |
639
+ | 1.0 | 200 | 0.9681 | 0.9677 |
640
+
641
+
642
+ ### Framework Versions
643
+ - Python: 3.11.11
644
+ - Sentence Transformers: 3.4.1
645
+ - Transformers: 4.51.1
646
+ - PyTorch: 2.5.1+cu124
647
+ - Accelerate: 1.3.0
648
+ - Datasets: 3.5.0
649
+ - Tokenizers: 0.21.0
650
+
651
+ ## Citation
652
+
653
+ ### BibTeX
654
+
655
+ #### Sentence Transformers
656
+ ```bibtex
657
+ @inproceedings{reimers-2019-sentence-bert,
658
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
659
+ author = "Reimers, Nils and Gurevych, Iryna",
660
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
661
+ month = "11",
662
+ year = "2019",
663
+ publisher = "Association for Computational Linguistics",
664
+ url = "https://arxiv.org/abs/1908.10084",
665
+ }
666
+ ```
667
+
668
+ #### TripletLoss
669
+ ```bibtex
670
+ @misc{hermans2017defense,
671
+ title={In Defense of the Triplet Loss for Person Re-Identification},
672
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
673
+ year={2017},
674
+ eprint={1703.07737},
675
+ archivePrefix={arXiv},
676
+ primaryClass={cs.CV}
677
+ }
678
+ ```
679
+
680
+ #### ContrastiveLoss
681
+ ```bibtex
682
+ @inproceedings{hadsell2006dimensionality,
683
+ author={Hadsell, R. and Chopra, S. and LeCun, Y.},
684
+ booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
685
+ title={Dimensionality Reduction by Learning an Invariant Mapping},
686
+ year={2006},
687
+ volume={2},
688
+ number={},
689
+ pages={1735-1742},
690
+ doi={10.1109/CVPR.2006.100}
691
+ }
692
+ ```
693
+
694
+ <!--
695
+ ## Glossary
696
+
697
+ *Clearly define terms in order to be accessible across audiences.*
698
+ -->
699
+
700
+ <!--
701
+ ## Model Card Authors
702
+
703
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
704
+ -->
705
+
706
+ <!--
707
+ ## Model Card Contact
708
+
709
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
710
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.51.1",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 250037
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.51.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2697fb3ee669a6170fbad0a28d1c7d9de59858db35c2181c45db785d3d01e00
3
+ size 470637416
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "do_lower_case": true,
48
+ "eos_token": "</s>",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "<mask>",
51
+ "max_length": 128,
52
+ "model_max_length": 128,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "<pad>",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "</s>",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "<unk>"
65
+ }
unigram.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da145b5e7700ae40f16691ec32a0b1fdc1ee3298db22a31ea55f57a966c4a65d
3
+ size 14763260