fjavigv commited on
Commit
35d9494
·
verified ·
1 Parent(s): 6b2d179

Upload 12 files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,1098 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:29911
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-m-v1.5
11
+ widget:
12
+ - source_sentence: What strategies can be implemented to effectively leverage private
13
+ financing opportunities for small and medium-sized enterprises (SMEs)?
14
+ sentences:
15
+ - (13) While the energy savings potential remains large in all sectors, there is
16
+ a particular challenge relating to transport, as it is responsible for more than
17
+ 30 % of final energy consumption, and to buildings, since 75 % of the Union’s
18
+ building stock has a poor energy performance. Another increasingly important sector
19
+ is the information and communications technology (ICT) sector, which is responsible
20
+ for 5 to 9 % of the world’s total electricity use and more than 2 % of global
21
+ emissions. In 2018, data centres accounted for 2,7 % of the electricity demand
22
+ in the EU-28. In that context, the Commission, in its communication of 19 February
23
+ 2020 on ‘Shaping Europe's digital future’ (the ‘Union’s Digital Strategy’), highlighted
24
+ the need for highly energy-efficient and sustainable data centres and transparency
25
+ measures for telecoms operators as regards their environmental footprint. Furthermore,
26
+ the possible increase in industry’s energy demand that may result from its decarbonisation,
27
+ particularly for energy intensive processes, should also be taken into account.
28
+ - SMEs in order to leverage and trigger private financing for SMEs.
29
+ - ►M5 — ◄ K Gases (petroleum), refinery; Refinery gas (A complex combination obtained
30
+ from various petroleum refining operations. It consists of hydrogen and hydrocarbons
31
+ having carbon numbers predominantly in the range of C1 through C3.) 649-153-00-0
32
+ 272-338-9 68814-67-5 ►M5 — ◄ K Gases (petroleum), platformer products separator
33
+ off; Refinery gas (A complex combination obtained from the chemical reforming
34
+ of naphthenes to aromatics. It consists of hydrogen and saturated aliphatic hydrocarbons
35
+ having carbon numbers predominantly in the range of C2 through C4.) 649-154-00-6
36
+ 272-343-6 68814-90-4 ►M5 — ◄ K Gases (petroleum), hydrotreated sour kerosine
37
+ depentaniser stabiliser off; Refinery gas (The complex combination obtained from
38
+ the
39
+ - source_sentence: How can an undertaking identify and leverage opportunities related
40
+ to sustainability matters within its business model and strategy?
41
+ sentences:
42
+ - 'i.
43
+
44
+
45
+ focusses on specific activities, business relationships, geographies or other
46
+ factors that give rise to heightened risk of adverse impacts;
47
+
48
+
49
+ ii.
50
+
51
+
52
+ considers the impacts with which the undertaking is involved through its own operations
53
+ or as a result of its business relationships;
54
+
55
+
56
+ iii.
57
+
58
+
59
+ includes consultation with affected stakeholders to understand how they may be
60
+ impacted and with external experts;
61
+
62
+
63
+ iv.
64
+
65
+
66
+ prioritises negative impacts based on their relative severity and likelihood,
67
+ (see ESRS 1 section 3.4 Impact materiality) and, if applicable, positive impacts
68
+ on their relative scale, scope and likelihood, and determines which sustainability
69
+ matters are material for reporting purposes, including the qualitative or quantitative
70
+ thresholds and other criteria used as prescribed by ESRS 1 section 3.4 Impact
71
+ materiality;
72
+
73
+
74
+ (c)
75
+
76
+
77
+ an overview of the process used to identify, assess, prioritise and monitor risks
78
+ and opportunities that have or may have financial effects . The disclosure shall
79
+ include:
80
+
81
+
82
+ i.
83
+
84
+
85
+ how the undertaking has considered the connections of its impacts and dependencies
86
+ with the risks and opportunities that may arise from those impacts and dependencies;
87
+
88
+
89
+ ii.
90
+
91
+
92
+ ►C1 how the undertaking assesses the likelihood, magnitude, and nature of effects
93
+ of the identified risk and opportunities (such as the qualitative or quantitative
94
+ thresholds and other criteria used as prescribed by ESRS 1 section 3.5 Financial
95
+ materiality); ◄
96
+
97
+
98
+ iii.
99
+
100
+
101
+ how the undertaking prioritises sustainability-related risks relative to other
102
+ types of risks, including its use of risk-assessment tools;
103
+
104
+
105
+ (d)
106
+
107
+
108
+ a description of the decision-making process and the related internal control
109
+ procedures;
110
+
111
+
112
+ (e)
113
+
114
+
115
+ the extent to which and how the process to identify, assess and manage impacts
116
+ and risks is integrated into the undertaking’s overall risk management process
117
+ and used to evaluate the undertaking’s overall risk profile and risk management
118
+ processes;
119
+
120
+
121
+ (f)
122
+
123
+
124
+ the extent to which and how the process to identify, assess and manage opportunities
125
+ is integrated into the undertaking’s overall management process where applicable;
126
+
127
+
128
+ (g)
129
+
130
+
131
+ the input parameters it uses (for example, data sources, the scope of operations
132
+ covered and the detail used in assumptions); and
133
+
134
+
135
+ (h)
136
+
137
+
138
+ whether and how the process has changed compared to the prior reporting period,
139
+ when the process was modified for the last time and future revision dates of the
140
+ materiality assessment.
141
+
142
+
143
+ Disclosure Requirement IRO-2 – Disclosure Requirements in ESRS covered by the
144
+ undertaking’s sustainability statement
145
+
146
+
147
+ The undertaking shall report on the Disclosure Requirements complied with in its
148
+ sustainability statements.
149
+
150
+
151
+ The objective of this Disclosure Requirement is to provide an understanding of
152
+ the Disclosure Requirements included in the undertaking’s sustainability statement
153
+ and of the topics that have been omitted as not material, as a result of the materiality
154
+ assessment.
155
+
156
+
157
+ The undertaking shall include a list of the Disclosure Requirements complied with
158
+ in preparing the sustainability statement , following the outcome of the materiality
159
+ assessment (see ESRS 1 chapter 3), including the page numbers and/or paragraphs
160
+ where the related disclosures are located in the sustainability statement. This
161
+ may be presented as a content index. The undertaking shall also include a table
162
+ of all the datapoints that derive from other EU legislation as listed in Appendix
163
+ B of this standard, indicating where they can be found in the sustainability statement
164
+ and including those that the undertaking has assessed as not material, in which
165
+ case the undertaking shall indicate ‘Not material’ in the table in accordance
166
+ with ESRS 1 paragraph 35.
167
+
168
+
169
+ If the undertaking concludes that climate change is not material and therefore
170
+ omits all disclosure requirements in ESRS E1 Climate change, it shall disclose
171
+ a detailed explanation of the conclusions of its materiality assessment with regard
172
+ to climate change (see ESRS 2 IRO-2 Disclosure Requirements in ESRS covered by
173
+ the undertaking’s sustainability statement), including a forward-looking analysis
174
+ of the conditions that could lead the undertaking to conclude that climate change
175
+ is material in the future.
176
+
177
+
178
+ If the undertaking concludes that a topic other than climate change is not material
179
+ and therefore omits all the Disclosure Requirements in the corresponding topical
180
+ ESRS, it may provide a brief explanation of the conclusions of its materiality
181
+ assessment for that topic.'
182
+ - '(b)
183
+
184
+
185
+ the number and type of market participants, including the ratio of market participants
186
+ to traded instruments in a particular product;
187
+
188
+
189
+ (c)
190
+
191
+
192
+ the average size of spreads, where available;
193
+
194
+
195
+ (26)
196
+
197
+
198
+ ‘competent authority’ means the authority, designated by each Member State in
199
+ accordance with Article 67, unless otherwise specified in this Directive;
200
+
201
+
202
+ (27)
203
+
204
+
205
+ ‘credit institution’ means a credit institution as defined in point (1) of Article
206
+ 4(1) of Regulation (EU) No 575/2013;
207
+
208
+
209
+ (28)
210
+
211
+
212
+ ‘UCITS management company’ means a management company as defined in point (b)
213
+ of Article 2(1) of Directive 2009/65/EC of the European Parliament and of the
214
+ Council ( 4 );
215
+
216
+
217
+ (29)'
218
+ - '(a)
219
+
220
+
221
+ a brief description of the undertaking’s business model and strategy, including:
222
+
223
+
224
+ (i)
225
+
226
+
227
+ the resilience of the undertaking’s business model and strategy in relation to
228
+ risks related to sustainability matters;
229
+
230
+
231
+ (ii)
232
+
233
+
234
+ the opportunities for the undertaking related to sustainability matters;
235
+
236
+
237
+ (iii)'
238
+ - source_sentence: What are the conditions under which an undertaking with an average
239
+ number of 750 employees can omit certain sustainability information while still
240
+ needing to disclose the materiality assessment of those topics?
241
+ sentences:
242
+ - '(c)
243
+
244
+
245
+ impose restrictions on non-EU AIFMs relating to the management of an AIF where
246
+ its activities potentially constitute an important source of counterparty risk
247
+ to a credit institution or other systemically relevant institutions.
248
+
249
+
250
+ 5.
251
+
252
+
253
+ ESMA may take a decision under paragraph 4 and subject to the requirements set
254
+ out in paragraph 6 if both of the following conditions are met:
255
+
256
+
257
+ (a)
258
+
259
+
260
+ a substantial threat exists, originating or aggravated by the activities of AIFMs,
261
+ to the orderly functioning and integrity of the financial market or to the stability
262
+ of the whole or a part of the financial system in the Union and there are cross
263
+ border implications; and
264
+
265
+
266
+ (b)'
267
+ - '▼B
268
+
269
+
270
+ If an undertaking or group not exceeding on its balance sheet date the average
271
+ number of 750 employees during the financial year decides to omit the information
272
+ required by ESRS E4, ESRS S1, ESRS S2, ESRS S3 or ESRS S4 in accordance with Appendix
273
+ C of ESRS 1, it shall nevertheless disclose whether the sustainability topics
274
+ covered respectively by ESRS E4, ESRS S1, ESRS S2, ESRS S3 and ESRS S4 have been
275
+ assessed to be material as a result of the undertaking’s materiality assessment.
276
+ In addition, if one or more of these topics has been assessed to be material,
277
+ the undertaking shall, for each material topic:
278
+
279
+
280
+ (a)'
281
+ - '9.
282
+
283
+
284
+ The Commission shall establish and keep up-to-date a register of recognised schemes.
285
+ That register shall be made publicly available on a free-access website. That
286
+ website shall also allow for the collation of feedback from all relevant stakeholders
287
+ concerning the implementation of recognised schemes. Such feedback shall be submitted
288
+ to the relevant scheme owners for consideration.
289
+
290
+
291
+ Article 31
292
+
293
+
294
+ Environmental footprint declaration
295
+
296
+
297
+ 1.'
298
+ - source_sentence: What are the specific roles and responsibilities of the InvestEU
299
+ Advisory Hub in relation to project development assistance for public authorities
300
+ and project promoters?
301
+ sentences:
302
+ - 'System B
303
+
304
+
305
+ Alternative characterisation Physical and chemical factors that determine the
306
+ characteristics of the coastal water and hence the biological community structure
307
+ and composition Obligatory factors latitude longitude tidal range salinity Optional
308
+ factors current velocity wave exposure mean water temperature mixing characteristics
309
+ turbidity retention time (of enclosed bays) mean substratum composition water
310
+ temperature range
311
+
312
+
313
+ 1.3. Establishment of type-specific reference conditions for surface water body
314
+ types'
315
+ - newly implemented since 31 December 2008 that continue to have an impact in 2020
316
+ with respect to the obligation period referred to in paragraph 1, first subparagraph,
317
+ point (a), and beyond 2020 with respect to the period referred to in point (b)(i),
318
+ of that subparagraph, and which can be measured and verified; --- --- (e) count
319
+ towards the amount of required energy savings, energy savings that stem from policy
320
+ measures, provided that it can be demonstrated that those measures result in individual
321
+ actions carried out from 1 January 2018 to 31 December 2020 which deliver savings
322
+ after 31 December 2020; --- --- (f) exclude from the calculation of the amount
323
+ of required energy savings pursuant to paragraph 1, first subparagraph, points
324
+ (a) and
325
+ - 'Advisory initiatives shall be available as a component under each policy window
326
+ referred to in Article 8(1), covering sectors under that window. In addition,
327
+ advisory initiatives shall be available under a cross-sectoral component.
328
+
329
+
330
+ 2.
331
+
332
+
333
+ The InvestEU Advisory Hub shall in particular:
334
+
335
+
336
+ (a)
337
+
338
+
339
+ provide a central point of entry, managed and hosted by the Commission, for project
340
+ development assistance under the InvestEU Advisory Hub for public authorities
341
+ and for project promoters;
342
+
343
+
344
+ (b)
345
+
346
+
347
+ disseminate to public authorities and project promoters all available additional
348
+ information regarding the investment guidelines, including information on their
349
+ application or on the interpretation provided by the Commission;
350
+
351
+
352
+ (c)'
353
+ - source_sentence: What is the definition of a preliminary economic assessment in
354
+ the context of evaluating projects for the recovery of critical raw materials?
355
+ sentences:
356
+ - 'For the purposes of the first subparagraph of this paragraph, insurance undertakings
357
+ referred to in point (a) of the first subparagraph of Article 1(3) of this Directive
358
+ that are part of a group, on the basis of financial relationships referred to
359
+ in point (c)(ii) of Article 212(1) of Directive 2009/138/EC, and which are subject
360
+ to group supervision in accordance with points (a) to (c) of Article 213(2) of
361
+ that Directive shall be treated as subsidiary undertakings of the parent undertaking
362
+ of that group.
363
+
364
+
365
+ 9.'
366
+ - '(a)
367
+
368
+
369
+ progress in the implementation of the Strategic Project, in particular with regard
370
+ to the permit-granting process;
371
+
372
+
373
+ (b)
374
+
375
+
376
+ where relevant, reasons for delays compared to the timetable referred to in Article
377
+ 7(1), point (c) and a plan to overcome such delays;
378
+
379
+
380
+ (c)
381
+
382
+
383
+ progress in financing the Strategic Project, including information on public financial
384
+ support.
385
+
386
+
387
+ The Commission shall submit a copy of the report referred to in the first subparagraph
388
+ of this paragraph to the Board in order to facilitate the discussions referred
389
+ to in Article 36(7), point (c).
390
+
391
+
392
+ 2.
393
+
394
+
395
+ The Commission may, where necessary, request additional information from project
396
+ promoters relevant to the implementation of the Strategic Project to ascertain
397
+ the continuing fulfilment of the criteria laid down in Article 6(1).
398
+
399
+
400
+ 3.
401
+
402
+
403
+ The project promoter shall notify the Commission of:
404
+
405
+
406
+ (a)
407
+
408
+
409
+ changes to the Strategic Project affecting its fulfilment of the criteria laid
410
+ down in Article 6(1);
411
+
412
+
413
+ (b)
414
+
415
+
416
+ changes in control of the undertakings involved in the Strategic Project on a
417
+ lasting basis, compared to the information referred to in Article 7(1), point
418
+ (e).
419
+
420
+
421
+ 4.
422
+
423
+
424
+ The Commission may adopt implementing acts establishing a single template to be
425
+ used by project promoters to provide all the information required for the reports
426
+ referred to in paragraph 1 of this Article. The single template may indicate how
427
+ the information referred to in paragraph 1 of this Article is to be expressed.
428
+ Those implementing acts shall be adopted in accordance with the advisory procedure
429
+ referred to in Article 39(2).
430
+
431
+
432
+ The extent of documentation required to complete the single template referred
433
+ to in the first subparagraph shall be reasonable.
434
+
435
+
436
+ 5.'
437
+ - '(39)
438
+
439
+
440
+ ‘preliminary economic assessment’ means an early-stage, conceptual assessment
441
+ of the potential economic viability of a project for the recovery of critical
442
+ raw materials from extractive waste;
443
+
444
+
445
+ (40)
446
+
447
+
448
+ ‘magnetic resonance imaging device’ means a non-invasive medical device that uses
449
+ magnetic fields to make anatomical images or any other device that uses magnetic
450
+ fields to make images of the inside of object;
451
+
452
+
453
+ (41)
454
+
455
+
456
+ ‘wind energy generator’ means the part of an onshore or offshore wind turbine
457
+ that converts the mechanical energy of the rotor into electrical energy;
458
+
459
+
460
+ (42)'
461
+ pipeline_tag: sentence-similarity
462
+ library_name: sentence-transformers
463
+ metrics:
464
+ - cosine_accuracy@1
465
+ - cosine_accuracy@3
466
+ - cosine_accuracy@5
467
+ - cosine_accuracy@10
468
+ - cosine_precision@1
469
+ - cosine_precision@3
470
+ - cosine_precision@5
471
+ - cosine_precision@10
472
+ - cosine_recall@1
473
+ - cosine_recall@3
474
+ - cosine_recall@5
475
+ - cosine_recall@10
476
+ - cosine_ndcg@10
477
+ - cosine_mrr@10
478
+ - cosine_map@100
479
+ model-index:
480
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
481
+ results:
482
+ - task:
483
+ type: information-retrieval
484
+ name: Information Retrieval
485
+ dataset:
486
+ name: Unknown
487
+ type: unknown
488
+ metrics:
489
+ - type: cosine_accuracy@1
490
+ value: 0.822517355870812
491
+ name: Cosine Accuracy@1
492
+ - type: cosine_accuracy@3
493
+ value: 0.9526109266525807
494
+ name: Cosine Accuracy@3
495
+ - type: cosine_accuracy@5
496
+ value: 0.9725324479323876
497
+ name: Cosine Accuracy@5
498
+ - type: cosine_accuracy@10
499
+ value: 0.9873226682764865
500
+ name: Cosine Accuracy@10
501
+ - type: cosine_precision@1
502
+ value: 0.822517355870812
503
+ name: Cosine Precision@1
504
+ - type: cosine_precision@3
505
+ value: 0.31753697555086025
506
+ name: Cosine Precision@3
507
+ - type: cosine_precision@5
508
+ value: 0.1945064895864775
509
+ name: Cosine Precision@5
510
+ - type: cosine_precision@10
511
+ value: 0.09873226682764866
512
+ name: Cosine Precision@10
513
+ - type: cosine_recall@1
514
+ value: 0.822517355870812
515
+ name: Cosine Recall@1
516
+ - type: cosine_recall@3
517
+ value: 0.9526109266525807
518
+ name: Cosine Recall@3
519
+ - type: cosine_recall@5
520
+ value: 0.9725324479323876
521
+ name: Cosine Recall@5
522
+ - type: cosine_recall@10
523
+ value: 0.9873226682764865
524
+ name: Cosine Recall@10
525
+ - type: cosine_ndcg@10
526
+ value: 0.9140763784801484
527
+ name: Cosine Ndcg@10
528
+ - type: cosine_mrr@10
529
+ value: 0.8895886335216252
530
+ name: Cosine Mrr@10
531
+ - type: cosine_map@100
532
+ value: 0.8902791958273809
533
+ name: Cosine Map@100
534
+ ---
535
+
536
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v1.5
537
+
538
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
539
+
540
+ ## Model Details
541
+
542
+ ### Model Description
543
+ - **Model Type:** Sentence Transformer
544
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) <!-- at revision 8e4eaca09c27ad3d501908636ec7c8bc3561b6de -->
545
+ - **Maximum Sequence Length:** 512 tokens
546
+ - **Output Dimensionality:** 768 dimensions
547
+ - **Similarity Function:** Cosine Similarity
548
+ <!-- - **Training Dataset:** Unknown -->
549
+ <!-- - **Language:** Unknown -->
550
+ <!-- - **License:** Unknown -->
551
+
552
+ ### Model Sources
553
+
554
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
555
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
556
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
557
+
558
+ ### Full Model Architecture
559
+
560
+ ```
561
+ SentenceTransformer(
562
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
563
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
564
+ (2): Normalize()
565
+ )
566
+ ```
567
+
568
+ ## Usage
569
+
570
+ ### Direct Usage (Sentence Transformers)
571
+
572
+ First install the Sentence Transformers library:
573
+
574
+ ```bash
575
+ pip install -U sentence-transformers
576
+ ```
577
+
578
+ Then you can load this model and run inference.
579
+ ```python
580
+ from sentence_transformers import SentenceTransformer
581
+
582
+ # Download from the 🤗 Hub
583
+ model = SentenceTransformer("sentence_transformers_model_id")
584
+ # Run inference
585
+ sentences = [
586
+ 'What is the definition of a preliminary economic assessment in the context of evaluating projects for the recovery of critical raw materials?',
587
+ '(39)\n\n‘preliminary economic assessment’ means an early-stage, conceptual assessment of the potential economic viability of a project for the recovery of critical raw materials from extractive waste;\n\n(40)\n\n‘magnetic resonance imaging device’ means a non-invasive medical device that uses magnetic fields to make anatomical images or any other device that uses magnetic fields to make images of the inside of object;\n\n(41)\n\n‘wind energy generator’ means the part of an onshore or offshore wind turbine that converts the mechanical energy of the rotor into electrical energy;\n\n(42)',
588
+ 'For the purposes of the first subparagraph of this paragraph, insurance undertakings referred to in point (a) of the first subparagraph of Article 1(3) of this Directive that are part of a group, on the basis of financial relationships referred to in point (c)(ii) of Article 212(1) of Directive 2009/138/EC, and which are subject to group supervision in accordance with points (a) to (c) of Article 213(2) of that Directive shall be treated as subsidiary undertakings of the parent undertaking of that group.\n\n9.',
589
+ ]
590
+ embeddings = model.encode(sentences)
591
+ print(embeddings.shape)
592
+ # [3, 768]
593
+
594
+ # Get the similarity scores for the embeddings
595
+ similarities = model.similarity(embeddings, embeddings)
596
+ print(similarities.shape)
597
+ # [3, 3]
598
+ ```
599
+
600
+ <!--
601
+ ### Direct Usage (Transformers)
602
+
603
+ <details><summary>Click to see the direct usage in Transformers</summary>
604
+
605
+ </details>
606
+ -->
607
+
608
+ <!--
609
+ ### Downstream Usage (Sentence Transformers)
610
+
611
+ You can finetune this model on your own dataset.
612
+
613
+ <details><summary>Click to expand</summary>
614
+
615
+ </details>
616
+ -->
617
+
618
+ <!--
619
+ ### Out-of-Scope Use
620
+
621
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
622
+ -->
623
+
624
+ ## Evaluation
625
+
626
+ ### Metrics
627
+
628
+ #### Information Retrieval
629
+
630
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
631
+
632
+ | Metric | Value |
633
+ |:--------------------|:-----------|
634
+ | cosine_accuracy@1 | 0.8225 |
635
+ | cosine_accuracy@3 | 0.9526 |
636
+ | cosine_accuracy@5 | 0.9725 |
637
+ | cosine_accuracy@10 | 0.9873 |
638
+ | cosine_precision@1 | 0.8225 |
639
+ | cosine_precision@3 | 0.3175 |
640
+ | cosine_precision@5 | 0.1945 |
641
+ | cosine_precision@10 | 0.0987 |
642
+ | cosine_recall@1 | 0.8225 |
643
+ | cosine_recall@3 | 0.9526 |
644
+ | cosine_recall@5 | 0.9725 |
645
+ | cosine_recall@10 | 0.9873 |
646
+ | **cosine_ndcg@10** | **0.9141** |
647
+ | cosine_mrr@10 | 0.8896 |
648
+ | cosine_map@100 | 0.8903 |
649
+
650
+ <!--
651
+ ## Bias, Risks and Limitations
652
+
653
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
654
+ -->
655
+
656
+ <!--
657
+ ### Recommendations
658
+
659
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
660
+ -->
661
+
662
+ ## Training Details
663
+
664
+ ### Training Dataset
665
+
666
+ #### Unnamed Dataset
667
+
668
+ * Size: 29,911 training samples
669
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
670
+ * Approximate statistics based on the first 1000 samples:
671
+ | | sentence_0 | sentence_1 |
672
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
673
+ | type | string | string |
674
+ | details | <ul><li>min: 13 tokens</li><li>mean: 41.63 tokens</li><li>max: 252 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 233.72 tokens</li><li>max: 512 tokens</li></ul> |
675
+ * Samples:
676
+ | sentence_0 | sentence_1 |
677
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
678
+ | <code>What measures must Member States take to ensure that workers who believe they have been discriminated against in terms of equal pay can establish their case before a competent authority or national court?</code> | <code>Article 18<br><br>Shift of burden of proof<br><br>1. Member States shall take the appropriate measures, in accordance with their national judicial systems, to ensure that, when workers who consider themselves wronged because the principle of equal pay has not been applied to them establish before a competent authority or national court facts from which it may be presumed that there has been direct or indirect discrimination, it shall be for the respondent to prove that there has been no direct or indirect discrimination in relation to pay.<br><br>2. Member States shall ensure that, in administrative procedures or court proceedings regarding alleged direct or indirect discrimination in relation to pay, where an employer has not implemented the pay transparency obligations set out in Articles 5, 6, 7, 9 and 10, it is for the employer to prove that there has been no such discrimination.<br><br>The first subparagraph of this paragraph shall not apply where the employer proves that the infringement of the obligati...</code> |
679
+ | <code>What are the key considerations for recognizing and addressing discrimination in the context of compensation and penalties, particularly in relation to the gender pay gap?</code> | <code>discrimination, in particular for substantive and procedural purposes, including to recognise the existence of discrimination, to decide on the appropriate comparator, to assess the proportionality, and to determine, where relevant, the level of compensation awarded or penalties imposed. An intersectional approach is important for understanding and addressing the gender pay gap. This clarification should not change the scope of employers’ obligations in regard to the pay transparency measures under this Directive. In particular, employers should not be required to gather data related to protected grounds other than sex.</code> |
680
+ | <code>What is the process for aircraft operators and shipping companies regarding the surrendering of allowances in relation to their total emissions from the previous calendar year?</code> | <code>(b)<br><br>each aircraft operator surrenders a number of allowances that is equal to its total emissions during the preceding calendar year, as verified in accordance with Article 15;<br><br>(c)<br><br>each shipping company surrenders a number of allowances that is equal to its total emissions during the preceding calendar year, as verified in accordance with Article 3ge.<br><br>Member States, administering Member States and administering authorities in respect of a shipping company shall ensure that allowances surrendered in accordance with the first subparagraph are subsequently cancelled.<br><br>▼M15<br><br>3-e.</code> |
681
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
682
+ ```json
683
+ {
684
+ "loss": "MultipleNegativesRankingLoss",
685
+ "matryoshka_dims": [
686
+ 768,
687
+ 512,
688
+ 256,
689
+ 128,
690
+ 64
691
+ ],
692
+ "matryoshka_weights": [
693
+ 1,
694
+ 1,
695
+ 1,
696
+ 1,
697
+ 1
698
+ ],
699
+ "n_dims_per_step": -1
700
+ }
701
+ ```
702
+
703
+ ### Training Hyperparameters
704
+ #### Non-Default Hyperparameters
705
+
706
+ - `eval_strategy`: steps
707
+ - `per_device_train_batch_size`: 6
708
+ - `per_device_eval_batch_size`: 6
709
+ - `num_train_epochs`: 4
710
+ - `multi_dataset_batch_sampler`: round_robin
711
+
712
+ #### All Hyperparameters
713
+ <details><summary>Click to expand</summary>
714
+
715
+ - `overwrite_output_dir`: False
716
+ - `do_predict`: False
717
+ - `eval_strategy`: steps
718
+ - `prediction_loss_only`: True
719
+ - `per_device_train_batch_size`: 6
720
+ - `per_device_eval_batch_size`: 6
721
+ - `per_gpu_train_batch_size`: None
722
+ - `per_gpu_eval_batch_size`: None
723
+ - `gradient_accumulation_steps`: 1
724
+ - `eval_accumulation_steps`: None
725
+ - `torch_empty_cache_steps`: None
726
+ - `learning_rate`: 5e-05
727
+ - `weight_decay`: 0.0
728
+ - `adam_beta1`: 0.9
729
+ - `adam_beta2`: 0.999
730
+ - `adam_epsilon`: 1e-08
731
+ - `max_grad_norm`: 1
732
+ - `num_train_epochs`: 4
733
+ - `max_steps`: -1
734
+ - `lr_scheduler_type`: linear
735
+ - `lr_scheduler_kwargs`: {}
736
+ - `warmup_ratio`: 0.0
737
+ - `warmup_steps`: 0
738
+ - `log_level`: passive
739
+ - `log_level_replica`: warning
740
+ - `log_on_each_node`: True
741
+ - `logging_nan_inf_filter`: True
742
+ - `save_safetensors`: True
743
+ - `save_on_each_node`: False
744
+ - `save_only_model`: False
745
+ - `restore_callback_states_from_checkpoint`: False
746
+ - `no_cuda`: False
747
+ - `use_cpu`: False
748
+ - `use_mps_device`: False
749
+ - `seed`: 42
750
+ - `data_seed`: None
751
+ - `jit_mode_eval`: False
752
+ - `use_ipex`: False
753
+ - `bf16`: False
754
+ - `fp16`: False
755
+ - `fp16_opt_level`: O1
756
+ - `half_precision_backend`: auto
757
+ - `bf16_full_eval`: False
758
+ - `fp16_full_eval`: False
759
+ - `tf32`: None
760
+ - `local_rank`: 0
761
+ - `ddp_backend`: None
762
+ - `tpu_num_cores`: None
763
+ - `tpu_metrics_debug`: False
764
+ - `debug`: []
765
+ - `dataloader_drop_last`: False
766
+ - `dataloader_num_workers`: 0
767
+ - `dataloader_prefetch_factor`: None
768
+ - `past_index`: -1
769
+ - `disable_tqdm`: False
770
+ - `remove_unused_columns`: True
771
+ - `label_names`: None
772
+ - `load_best_model_at_end`: False
773
+ - `ignore_data_skip`: False
774
+ - `fsdp`: []
775
+ - `fsdp_min_num_params`: 0
776
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
777
+ - `fsdp_transformer_layer_cls_to_wrap`: None
778
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
779
+ - `deepspeed`: None
780
+ - `label_smoothing_factor`: 0.0
781
+ - `optim`: adamw_torch
782
+ - `optim_args`: None
783
+ - `adafactor`: False
784
+ - `group_by_length`: False
785
+ - `length_column_name`: length
786
+ - `ddp_find_unused_parameters`: None
787
+ - `ddp_bucket_cap_mb`: None
788
+ - `ddp_broadcast_buffers`: False
789
+ - `dataloader_pin_memory`: True
790
+ - `dataloader_persistent_workers`: False
791
+ - `skip_memory_metrics`: True
792
+ - `use_legacy_prediction_loop`: False
793
+ - `push_to_hub`: False
794
+ - `resume_from_checkpoint`: None
795
+ - `hub_model_id`: None
796
+ - `hub_strategy`: every_save
797
+ - `hub_private_repo`: None
798
+ - `hub_always_push`: False
799
+ - `gradient_checkpointing`: False
800
+ - `gradient_checkpointing_kwargs`: None
801
+ - `include_inputs_for_metrics`: False
802
+ - `include_for_metrics`: []
803
+ - `eval_do_concat_batches`: True
804
+ - `fp16_backend`: auto
805
+ - `push_to_hub_model_id`: None
806
+ - `push_to_hub_organization`: None
807
+ - `mp_parameters`:
808
+ - `auto_find_batch_size`: False
809
+ - `full_determinism`: False
810
+ - `torchdynamo`: None
811
+ - `ray_scope`: last
812
+ - `ddp_timeout`: 1800
813
+ - `torch_compile`: False
814
+ - `torch_compile_backend`: None
815
+ - `torch_compile_mode`: None
816
+ - `dispatch_batches`: None
817
+ - `split_batches`: None
818
+ - `include_tokens_per_second`: False
819
+ - `include_num_input_tokens_seen`: False
820
+ - `neftune_noise_alpha`: None
821
+ - `optim_target_modules`: None
822
+ - `batch_eval_metrics`: False
823
+ - `eval_on_start`: False
824
+ - `use_liger_kernel`: False
825
+ - `eval_use_gather_object`: False
826
+ - `average_tokens_across_devices`: False
827
+ - `prompts`: None
828
+ - `batch_sampler`: batch_sampler
829
+ - `multi_dataset_batch_sampler`: round_robin
830
+
831
+ </details>
832
+
833
+ ### Training Logs
834
+ <details><summary>Click to expand</summary>
835
+
836
+ | Epoch | Step | Training Loss | cosine_ndcg@10 |
837
+ |:------:|:-----:|:-------------:|:--------------:|
838
+ | 0.0201 | 100 | - | 0.6629 |
839
+ | 0.0401 | 200 | - | 0.7746 |
840
+ | 0.0602 | 300 | - | 0.8233 |
841
+ | 0.0802 | 400 | - | 0.8515 |
842
+ | 0.1003 | 500 | 0.4694 | 0.8621 |
843
+ | 0.1203 | 600 | - | 0.8680 |
844
+ | 0.1404 | 700 | - | 0.8733 |
845
+ | 0.1604 | 800 | - | 0.8774 |
846
+ | 0.1805 | 900 | - | 0.8757 |
847
+ | 0.2006 | 1000 | 0.1568 | 0.8795 |
848
+ | 0.2206 | 1100 | - | 0.8808 |
849
+ | 0.2407 | 1200 | - | 0.8789 |
850
+ | 0.2607 | 1300 | - | 0.8796 |
851
+ | 0.2808 | 1400 | - | 0.8822 |
852
+ | 0.3008 | 1500 | 0.1015 | 0.8821 |
853
+ | 0.3209 | 1600 | - | 0.8814 |
854
+ | 0.3410 | 1700 | - | 0.8756 |
855
+ | 0.3610 | 1800 | - | 0.8822 |
856
+ | 0.3811 | 1900 | - | 0.8848 |
857
+ | 0.4011 | 2000 | 0.0836 | 0.8843 |
858
+ | 0.4212 | 2100 | - | 0.8841 |
859
+ | 0.4412 | 2200 | - | 0.8803 |
860
+ | 0.4613 | 2300 | - | 0.8851 |
861
+ | 0.4813 | 2400 | - | 0.8818 |
862
+ | 0.5014 | 2500 | 0.0865 | 0.8849 |
863
+ | 0.5215 | 2600 | - | 0.8877 |
864
+ | 0.5415 | 2700 | - | 0.8806 |
865
+ | 0.5616 | 2800 | - | 0.8832 |
866
+ | 0.5816 | 2900 | - | 0.8930 |
867
+ | 0.6017 | 3000 | 0.0842 | 0.8928 |
868
+ | 0.6217 | 3100 | - | 0.8882 |
869
+ | 0.6418 | 3200 | - | 0.8858 |
870
+ | 0.6619 | 3300 | - | 0.8863 |
871
+ | 0.6819 | 3400 | - | 0.8828 |
872
+ | 0.7020 | 3500 | 0.0669 | 0.8839 |
873
+ | 0.7220 | 3600 | - | 0.8835 |
874
+ | 0.7421 | 3700 | - | 0.8854 |
875
+ | 0.7621 | 3800 | - | 0.8839 |
876
+ | 0.7822 | 3900 | - | 0.8882 |
877
+ | 0.8022 | 4000 | 0.0695 | 0.8871 |
878
+ | 0.8223 | 4100 | - | 0.8854 |
879
+ | 0.8424 | 4200 | - | 0.8822 |
880
+ | 0.8624 | 4300 | - | 0.8847 |
881
+ | 0.8825 | 4400 | - | 0.8863 |
882
+ | 0.9025 | 4500 | 0.0575 | 0.8819 |
883
+ | 0.9226 | 4600 | - | 0.8815 |
884
+ | 0.9426 | 4700 | - | 0.8836 |
885
+ | 0.9627 | 4800 | - | 0.8862 |
886
+ | 0.9828 | 4900 | - | 0.8889 |
887
+ | 1.0 | 4986 | - | 0.8927 |
888
+ | 1.0028 | 5000 | 0.0712 | 0.8935 |
889
+ | 1.0229 | 5100 | - | 0.8890 |
890
+ | 1.0429 | 5200 | - | 0.8919 |
891
+ | 1.0630 | 5300 | - | 0.8949 |
892
+ | 1.0830 | 5400 | - | 0.8950 |
893
+ | 1.1031 | 5500 | 0.0485 | 0.8934 |
894
+ | 1.1231 | 5600 | - | 0.8964 |
895
+ | 1.1432 | 5700 | - | 0.8953 |
896
+ | 1.1633 | 5800 | - | 0.8942 |
897
+ | 1.1833 | 5900 | - | 0.8929 |
898
+ | 1.2034 | 6000 | 0.0465 | 0.8912 |
899
+ | 1.2234 | 6100 | - | 0.8890 |
900
+ | 1.2435 | 6200 | - | 0.8914 |
901
+ | 1.2635 | 6300 | - | 0.8847 |
902
+ | 1.2836 | 6400 | - | 0.8873 |
903
+ | 1.3037 | 6500 | 0.0324 | 0.8912 |
904
+ | 1.3237 | 6600 | - | 0.8956 |
905
+ | 1.3438 | 6700 | - | 0.8954 |
906
+ | 1.3638 | 6800 | - | 0.8946 |
907
+ | 1.3839 | 6900 | - | 0.8931 |
908
+ | 1.4039 | 7000 | 0.0205 | 0.8951 |
909
+ | 1.4240 | 7100 | - | 0.8967 |
910
+ | 1.4440 | 7200 | - | 0.8960 |
911
+ | 1.4641 | 7300 | - | 0.8943 |
912
+ | 1.4842 | 7400 | - | 0.9003 |
913
+ | 1.5042 | 7500 | 0.0489 | 0.8946 |
914
+ | 1.5243 | 7600 | - | 0.8986 |
915
+ | 1.5443 | 7700 | - | 0.8945 |
916
+ | 1.5644 | 7800 | - | 0.8960 |
917
+ | 1.5844 | 7900 | - | 0.8987 |
918
+ | 1.6045 | 8000 | 0.039 | 0.8991 |
919
+ | 1.6245 | 8100 | - | 0.8959 |
920
+ | 1.6446 | 8200 | - | 0.8948 |
921
+ | 1.6647 | 8300 | - | 0.8933 |
922
+ | 1.6847 | 8400 | - | 0.8926 |
923
+ | 1.7048 | 8500 | 0.0297 | 0.8937 |
924
+ | 1.7248 | 8600 | - | 0.8974 |
925
+ | 1.7449 | 8700 | - | 0.8977 |
926
+ | 1.7649 | 8800 | - | 0.8973 |
927
+ | 1.7850 | 8900 | - | 0.8989 |
928
+ | 1.8051 | 9000 | 0.0248 | 0.8974 |
929
+ | 1.8251 | 9100 | - | 0.8980 |
930
+ | 1.8452 | 9200 | - | 0.8970 |
931
+ | 1.8652 | 9300 | - | 0.8997 |
932
+ | 1.8853 | 9400 | - | 0.9007 |
933
+ | 1.9053 | 9500 | 0.0534 | 0.9009 |
934
+ | 1.9254 | 9600 | - | 0.9015 |
935
+ | 1.9454 | 9700 | - | 0.9014 |
936
+ | 1.9655 | 9800 | - | 0.9008 |
937
+ | 1.9856 | 9900 | - | 0.9024 |
938
+ | 2.0 | 9972 | - | 0.9052 |
939
+ | 2.0056 | 10000 | 0.0295 | 0.9041 |
940
+ | 2.0257 | 10100 | - | 0.9009 |
941
+ | 2.0457 | 10200 | - | 0.9030 |
942
+ | 2.0658 | 10300 | - | 0.9028 |
943
+ | 2.0858 | 10400 | - | 0.9051 |
944
+ | 2.1059 | 10500 | 0.027 | 0.9063 |
945
+ | 2.1260 | 10600 | - | 0.9059 |
946
+ | 2.1460 | 10700 | - | 0.9044 |
947
+ | 2.1661 | 10800 | - | 0.9024 |
948
+ | 2.1861 | 10900 | - | 0.9005 |
949
+ | 2.2062 | 11000 | 0.0201 | 0.8996 |
950
+ | 2.2262 | 11100 | - | 0.9037 |
951
+ | 2.2463 | 11200 | - | 0.9029 |
952
+ | 2.2663 | 11300 | - | 0.9047 |
953
+ | 2.2864 | 11400 | - | 0.9030 |
954
+ | 2.3065 | 11500 | 0.0097 | 0.9041 |
955
+ | 2.3265 | 11600 | - | 0.9011 |
956
+ | 2.3466 | 11700 | - | 0.9000 |
957
+ | 2.3666 | 11800 | - | 0.8972 |
958
+ | 2.3867 | 11900 | - | 0.8985 |
959
+ | 2.4067 | 12000 | 0.0165 | 0.8979 |
960
+ | 2.4268 | 12100 | - | 0.8996 |
961
+ | 2.4469 | 12200 | - | 0.9026 |
962
+ | 2.4669 | 12300 | - | 0.9034 |
963
+ | 2.4870 | 12400 | - | 0.9054 |
964
+ | 2.5070 | 12500 | 0.0165 | 0.9029 |
965
+ | 2.5271 | 12600 | - | 0.9052 |
966
+ | 2.5471 | 12700 | - | 0.9057 |
967
+ | 2.5672 | 12800 | - | 0.9059 |
968
+ | 2.5872 | 12900 | - | 0.9092 |
969
+ | 2.6073 | 13000 | 0.0144 | 0.9081 |
970
+ | 2.6274 | 13100 | - | 0.9095 |
971
+ | 2.6474 | 13200 | - | 0.9102 |
972
+ | 2.6675 | 13300 | - | 0.9113 |
973
+ | 2.6875 | 13400 | - | 0.9103 |
974
+ | 2.7076 | 13500 | 0.0159 | 0.9105 |
975
+ | 2.7276 | 13600 | - | 0.9073 |
976
+ | 2.7477 | 13700 | - | 0.9084 |
977
+ | 2.7677 | 13800 | - | 0.9080 |
978
+ | 2.7878 | 13900 | - | 0.9083 |
979
+ | 2.8079 | 14000 | 0.0183 | 0.9083 |
980
+ | 2.8279 | 14100 | - | 0.9070 |
981
+ | 2.8480 | 14200 | - | 0.9085 |
982
+ | 2.8680 | 14300 | - | 0.9078 |
983
+ | 2.8881 | 14400 | - | 0.9075 |
984
+ | 2.9081 | 14500 | 0.0257 | 0.9073 |
985
+ | 2.9282 | 14600 | - | 0.9098 |
986
+ | 2.9483 | 14700 | - | 0.9089 |
987
+ | 2.9683 | 14800 | - | 0.9097 |
988
+ | 2.9884 | 14900 | - | 0.9079 |
989
+ | 3.0 | 14958 | - | 0.9081 |
990
+ | 3.0084 | 15000 | 0.0144 | 0.9084 |
991
+ | 3.0285 | 15100 | - | 0.9083 |
992
+ | 3.0485 | 15200 | - | 0.9078 |
993
+ | 3.0686 | 15300 | - | 0.9079 |
994
+ | 3.0886 | 15400 | - | 0.9089 |
995
+ | 3.1087 | 15500 | 0.0082 | 0.9093 |
996
+ | 3.1288 | 15600 | - | 0.9098 |
997
+ | 3.1488 | 15700 | - | 0.9106 |
998
+ | 3.1689 | 15800 | - | 0.9103 |
999
+ | 3.1889 | 15900 | - | 0.9110 |
1000
+ | 3.2090 | 16000 | 0.0185 | 0.9117 |
1001
+ | 3.2290 | 16100 | - | 0.9116 |
1002
+ | 3.2491 | 16200 | - | 0.9125 |
1003
+ | 3.2692 | 16300 | - | 0.9111 |
1004
+ | 3.2892 | 16400 | - | 0.9109 |
1005
+ | 3.3093 | 16500 | 0.0105 | 0.9125 |
1006
+ | 3.3293 | 16600 | - | 0.9117 |
1007
+ | 3.3494 | 16700 | - | 0.9118 |
1008
+ | 3.3694 | 16800 | - | 0.9117 |
1009
+ | 3.3895 | 16900 | - | 0.9137 |
1010
+ | 3.4095 | 17000 | 0.019 | 0.9134 |
1011
+ | 3.4296 | 17100 | - | 0.9129 |
1012
+ | 3.4497 | 17200 | - | 0.9126 |
1013
+ | 3.4697 | 17300 | - | 0.9133 |
1014
+ | 3.4898 | 17400 | - | 0.9136 |
1015
+ | 3.5098 | 17500 | 0.0109 | 0.9120 |
1016
+ | 3.5299 | 17600 | - | 0.9124 |
1017
+ | 3.5499 | 17700 | - | 0.9122 |
1018
+ | 3.5700 | 17800 | - | 0.9129 |
1019
+ | 3.5901 | 17900 | - | 0.9132 |
1020
+ | 3.6101 | 18000 | 0.0207 | 0.9139 |
1021
+ | 3.6302 | 18100 | - | 0.9134 |
1022
+ | 3.6502 | 18200 | - | 0.9135 |
1023
+ | 3.6703 | 18300 | - | 0.9139 |
1024
+ | 3.6903 | 18400 | - | 0.9141 |
1025
+ | 3.7104 | 18500 | 0.0105 | 0.9139 |
1026
+ | 3.7304 | 18600 | - | 0.9138 |
1027
+ | 3.7505 | 18700 | - | 0.9136 |
1028
+ | 3.7706 | 18800 | - | 0.9141 |
1029
+
1030
+ </details>
1031
+
1032
+ ### Framework Versions
1033
+ - Python: 3.10.11
1034
+ - Sentence Transformers: 3.4.1
1035
+ - Transformers: 4.48.1
1036
+ - PyTorch: 2.4.0+cu121
1037
+ - Accelerate: 1.4.0
1038
+ - Datasets: 3.3.2
1039
+ - Tokenizers: 0.21.0
1040
+
1041
+ ## Citation
1042
+
1043
+ ### BibTeX
1044
+
1045
+ #### Sentence Transformers
1046
+ ```bibtex
1047
+ @inproceedings{reimers-2019-sentence-bert,
1048
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1049
+ author = "Reimers, Nils and Gurevych, Iryna",
1050
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1051
+ month = "11",
1052
+ year = "2019",
1053
+ publisher = "Association for Computational Linguistics",
1054
+ url = "https://arxiv.org/abs/1908.10084",
1055
+ }
1056
+ ```
1057
+
1058
+ #### MatryoshkaLoss
1059
+ ```bibtex
1060
+ @misc{kusupati2024matryoshka,
1061
+ title={Matryoshka Representation Learning},
1062
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
1063
+ year={2024},
1064
+ eprint={2205.13147},
1065
+ archivePrefix={arXiv},
1066
+ primaryClass={cs.LG}
1067
+ }
1068
+ ```
1069
+
1070
+ #### MultipleNegativesRankingLoss
1071
+ ```bibtex
1072
+ @misc{henderson2017efficient,
1073
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1074
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1075
+ year={2017},
1076
+ eprint={1705.00652},
1077
+ archivePrefix={arXiv},
1078
+ primaryClass={cs.CL}
1079
+ }
1080
+ ```
1081
+
1082
+ <!--
1083
+ ## Glossary
1084
+
1085
+ *Clearly define terms in order to be accessible across audiences.*
1086
+ -->
1087
+
1088
+ <!--
1089
+ ## Model Card Authors
1090
+
1091
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1092
+ -->
1093
+
1094
+ <!--
1095
+ ## Model Card Contact
1096
+
1097
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1098
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-m-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.48.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.1",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
eval/Information-Retrieval_evaluation_results.csv ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
+ 1.0,4986,0.7850890431632961,0.9338967702988228,0.9628735285239963,0.9809840024147298,0.7850890431632961,0.7850890431632961,0.31129892343294097,0.9338967702988228,0.1925747057047993,0.9628735285239963,0.09809840024147298,0.9809840024147298,0.8632908360043888,0.8926674753228052,0.8642022196604484
3
+ 2.0,9972,0.8071234530636885,0.9444612134017507,0.967702988228192,0.9858134621189254,0.8071234530636885,0.8071234530636885,0.31482040446725024,0.9444612134017507,0.19354059764563838,0.967702988228192,0.09858134621189255,0.9858134621189254,0.8783682846314909,0.9051829518534917,0.8790679750501477
4
+ 3.0,14958,0.8119529127678841,0.9477814669483852,0.9713250830063387,0.9846060971928765,0.8119529127678841,0.8119529127678841,0.3159271556494617,0.9477814669483852,0.19426501660126774,0.9713250830063387,0.09846060971928766,0.9846060971928765,0.8825035095032083,0.9081351902421758,0.8833347238346708
5
+ 4.0,19944,0.8222155146392998,0.9514035617265318,0.9740416540899487,0.9870208270449743,0.8222155146392998,0.8222155146392998,0.31713452057551056,0.9514035617265318,0.19480833081798973,0.9740416540899487,0.09870208270449743,0.9870208270449743,0.8888231306205954,0.9134132315268919,0.8895360861777215
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44dc93d7c4889856e33f55a9ed8b80e1afe299e9f38272390e0587aa4bb79e4
3
+ size 435588776
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff