bnkc123 commited on
Commit
f972798
·
verified ·
1 Parent(s): 21dae82

Model save

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,862 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: How many hours of training and development did Bank of America
16
+ provide to its employees in 2023?
17
+ sentences:
18
+ - In recent years, several jurisdictions have enhanced their laws and regulations
19
+ in this area, increased their enforcement activities, and/or increased the level
20
+ of cross-border coordination and information sharing.
21
+ - In 2023, Bank of America delivered approximately 6.7 million hours of training
22
+ and development to its teammates through Bank of America Academy.
23
+ - Chevron affiliates manage a total of 338 thousand net acres, as detailed in the
24
+ table for acreage distribution as of December 31, 2023.
25
+ - source_sentence: What are the projected trends for Comcast's residential connectivity
26
+ revenue in 2023?
27
+ sentences:
28
+ - In 2023, Switzerland’s Federal Council passed legislation which would implement
29
+ a federal minimum tax in Switzerland of 15% in 2024.
30
+ - We believe our residential connectivity revenue will increase as a result of growth
31
+ in average domestic broadband revenue per customer, as well as increases in domestic
32
+ wireless and international connectivity revenue.
33
+ - Approximately 97% of our debt securities were investment-grade quality, with a
34
+ weighted average credit rating of AA- at the end of 2023.
35
+ - source_sentence: What type of merchandise is included under seasonal and electronics
36
+ merchandise?
37
+ sentences:
38
+ - Seasonal and electronics merchandise at the company includes items related to
39
+ Christmas, Easter, Halloween, and Valentine's Day, along with personal electronics
40
+ like pre-paid cellular phones and services.
41
+ - Hewlett Packard Enterprise, with over half of their revenue generated overseas,
42
+ experiences impact from fluctuations in foreign currency exchange rates. These
43
+ fluctuations have increased product costs and moderated revenue and earnings growth,
44
+ particularly in recent periods.
45
+ - Net investment income grew from $597 million in 2021 to $895 million in 2023,
46
+ which is a 43.0% increase.
47
+ - source_sentence: What are the types of dialysis available for ESKD patients and
48
+ how often is hemodialysis typically performed?
49
+ sentences:
50
+ - Note 16 is important in a Form 10-K for providing detailed information on legal
51
+ proceedings as 'Commitments and Contingencies.'
52
+ - Amazon believes that the principal competitive factors in its retail businesses
53
+ include selection, price, and convenience, including fast and reliable fulfillment.
54
+ - Dialysis options for ESKD patients include hemodialysis, which is usually performed
55
+ three times per week, and peritoneal dialysis.
56
+ - source_sentence: How much total cash did The Hershey Company use for share repurchases
57
+ in 2023 excluding excise tax?
58
+ sentences:
59
+ - In 2023, The Hershey Company used a total of $267.3 million in cash for share
60
+ repurchases, excluding any excise tax.
61
+ - Operating income increased $5.8 billion, or 72.8%, in 2023 compared to 2022. The
62
+ increase in operating income was primarily driven by the absence of $5.8 billion
63
+ of opioid litigation charges recorded in 2022 and increases in the Pharmacy &
64
+ Consumer Wellness segment, primarily driven by the absence of a $2.5 billion loss
65
+ on assets held for sale recorded in 2022 related to the write-down of the Company’s
66
+ Omnicare® long-term care business which was partially offset by continued pharmacy
67
+ reimbursement pressure and decreased COVID-19 vaccinations and diagnostic testing
68
+ compared to 2022, as well as an increase in the Health Services segment.
69
+ - Net income for the year ended December 31, 2023, was $307,568, contrasting with
70
+ a net loss of $694,288 in 2022.
71
+ datasets:
72
+ - philschmid/finanical-rag-embedding-dataset
73
+ pipeline_tag: sentence-similarity
74
+ library_name: sentence-transformers
75
+ metrics:
76
+ - cosine_accuracy@1
77
+ - cosine_accuracy@3
78
+ - cosine_accuracy@5
79
+ - cosine_accuracy@10
80
+ - cosine_precision@1
81
+ - cosine_precision@3
82
+ - cosine_precision@5
83
+ - cosine_precision@10
84
+ - cosine_recall@1
85
+ - cosine_recall@3
86
+ - cosine_recall@5
87
+ - cosine_recall@10
88
+ - cosine_ndcg@10
89
+ - cosine_mrr@10
90
+ - cosine_map@100
91
+ model-index:
92
+ - name: BGE base Financial Matryoshka
93
+ results:
94
+ - task:
95
+ type: information-retrieval
96
+ name: Information Retrieval
97
+ dataset:
98
+ name: dim 768
99
+ type: dim_768
100
+ metrics:
101
+ - type: cosine_accuracy@1
102
+ value: 0.6914285714285714
103
+ name: Cosine Accuracy@1
104
+ - type: cosine_accuracy@3
105
+ value: 0.8257142857142857
106
+ name: Cosine Accuracy@3
107
+ - type: cosine_accuracy@5
108
+ value: 0.8685714285714285
109
+ name: Cosine Accuracy@5
110
+ - type: cosine_accuracy@10
111
+ value: 0.9228571428571428
112
+ name: Cosine Accuracy@10
113
+ - type: cosine_precision@1
114
+ value: 0.6914285714285714
115
+ name: Cosine Precision@1
116
+ - type: cosine_precision@3
117
+ value: 0.2752380952380952
118
+ name: Cosine Precision@3
119
+ - type: cosine_precision@5
120
+ value: 0.1737142857142857
121
+ name: Cosine Precision@5
122
+ - type: cosine_precision@10
123
+ value: 0.09228571428571428
124
+ name: Cosine Precision@10
125
+ - type: cosine_recall@1
126
+ value: 0.6914285714285714
127
+ name: Cosine Recall@1
128
+ - type: cosine_recall@3
129
+ value: 0.8257142857142857
130
+ name: Cosine Recall@3
131
+ - type: cosine_recall@5
132
+ value: 0.8685714285714285
133
+ name: Cosine Recall@5
134
+ - type: cosine_recall@10
135
+ value: 0.9228571428571428
136
+ name: Cosine Recall@10
137
+ - type: cosine_ndcg@10
138
+ value: 0.8071406101424283
139
+ name: Cosine Ndcg@10
140
+ - type: cosine_mrr@10
141
+ value: 0.770200113378685
142
+ name: Cosine Mrr@10
143
+ - type: cosine_map@100
144
+ value: 0.7731689567146356
145
+ name: Cosine Map@100
146
+ - task:
147
+ type: information-retrieval
148
+ name: Information Retrieval
149
+ dataset:
150
+ name: dim 512
151
+ type: dim_512
152
+ metrics:
153
+ - type: cosine_accuracy@1
154
+ value: 0.6985714285714286
155
+ name: Cosine Accuracy@1
156
+ - type: cosine_accuracy@3
157
+ value: 0.8314285714285714
158
+ name: Cosine Accuracy@3
159
+ - type: cosine_accuracy@5
160
+ value: 0.8685714285714285
161
+ name: Cosine Accuracy@5
162
+ - type: cosine_accuracy@10
163
+ value: 0.9142857142857143
164
+ name: Cosine Accuracy@10
165
+ - type: cosine_precision@1
166
+ value: 0.6985714285714286
167
+ name: Cosine Precision@1
168
+ - type: cosine_precision@3
169
+ value: 0.27714285714285714
170
+ name: Cosine Precision@3
171
+ - type: cosine_precision@5
172
+ value: 0.17371428571428568
173
+ name: Cosine Precision@5
174
+ - type: cosine_precision@10
175
+ value: 0.09142857142857141
176
+ name: Cosine Precision@10
177
+ - type: cosine_recall@1
178
+ value: 0.6985714285714286
179
+ name: Cosine Recall@1
180
+ - type: cosine_recall@3
181
+ value: 0.8314285714285714
182
+ name: Cosine Recall@3
183
+ - type: cosine_recall@5
184
+ value: 0.8685714285714285
185
+ name: Cosine Recall@5
186
+ - type: cosine_recall@10
187
+ value: 0.9142857142857143
188
+ name: Cosine Recall@10
189
+ - type: cosine_ndcg@10
190
+ value: 0.8065430842560983
191
+ name: Cosine Ndcg@10
192
+ - type: cosine_mrr@10
193
+ value: 0.7719557823129252
194
+ name: Cosine Mrr@10
195
+ - type: cosine_map@100
196
+ value: 0.775512801809706
197
+ name: Cosine Map@100
198
+ - task:
199
+ type: information-retrieval
200
+ name: Information Retrieval
201
+ dataset:
202
+ name: dim 256
203
+ type: dim_256
204
+ metrics:
205
+ - type: cosine_accuracy@1
206
+ value: 0.6842857142857143
207
+ name: Cosine Accuracy@1
208
+ - type: cosine_accuracy@3
209
+ value: 0.8214285714285714
210
+ name: Cosine Accuracy@3
211
+ - type: cosine_accuracy@5
212
+ value: 0.8671428571428571
213
+ name: Cosine Accuracy@5
214
+ - type: cosine_accuracy@10
215
+ value: 0.9057142857142857
216
+ name: Cosine Accuracy@10
217
+ - type: cosine_precision@1
218
+ value: 0.6842857142857143
219
+ name: Cosine Precision@1
220
+ - type: cosine_precision@3
221
+ value: 0.2738095238095238
222
+ name: Cosine Precision@3
223
+ - type: cosine_precision@5
224
+ value: 0.1734285714285714
225
+ name: Cosine Precision@5
226
+ - type: cosine_precision@10
227
+ value: 0.09057142857142855
228
+ name: Cosine Precision@10
229
+ - type: cosine_recall@1
230
+ value: 0.6842857142857143
231
+ name: Cosine Recall@1
232
+ - type: cosine_recall@3
233
+ value: 0.8214285714285714
234
+ name: Cosine Recall@3
235
+ - type: cosine_recall@5
236
+ value: 0.8671428571428571
237
+ name: Cosine Recall@5
238
+ - type: cosine_recall@10
239
+ value: 0.9057142857142857
240
+ name: Cosine Recall@10
241
+ - type: cosine_ndcg@10
242
+ value: 0.7965883498968402
243
+ name: Cosine Ndcg@10
244
+ - type: cosine_mrr@10
245
+ value: 0.7613792517006803
246
+ name: Cosine Mrr@10
247
+ - type: cosine_map@100
248
+ value: 0.7655926405987631
249
+ name: Cosine Map@100
250
+ - task:
251
+ type: information-retrieval
252
+ name: Information Retrieval
253
+ dataset:
254
+ name: dim 128
255
+ type: dim_128
256
+ metrics:
257
+ - type: cosine_accuracy@1
258
+ value: 0.6828571428571428
259
+ name: Cosine Accuracy@1
260
+ - type: cosine_accuracy@3
261
+ value: 0.8157142857142857
262
+ name: Cosine Accuracy@3
263
+ - type: cosine_accuracy@5
264
+ value: 0.8557142857142858
265
+ name: Cosine Accuracy@5
266
+ - type: cosine_accuracy@10
267
+ value: 0.9057142857142857
268
+ name: Cosine Accuracy@10
269
+ - type: cosine_precision@1
270
+ value: 0.6828571428571428
271
+ name: Cosine Precision@1
272
+ - type: cosine_precision@3
273
+ value: 0.27190476190476187
274
+ name: Cosine Precision@3
275
+ - type: cosine_precision@5
276
+ value: 0.17114285714285712
277
+ name: Cosine Precision@5
278
+ - type: cosine_precision@10
279
+ value: 0.09057142857142855
280
+ name: Cosine Precision@10
281
+ - type: cosine_recall@1
282
+ value: 0.6828571428571428
283
+ name: Cosine Recall@1
284
+ - type: cosine_recall@3
285
+ value: 0.8157142857142857
286
+ name: Cosine Recall@3
287
+ - type: cosine_recall@5
288
+ value: 0.8557142857142858
289
+ name: Cosine Recall@5
290
+ - type: cosine_recall@10
291
+ value: 0.9057142857142857
292
+ name: Cosine Recall@10
293
+ - type: cosine_ndcg@10
294
+ value: 0.7942960704612301
295
+ name: Cosine Ndcg@10
296
+ - type: cosine_mrr@10
297
+ value: 0.7586780045351473
298
+ name: Cosine Mrr@10
299
+ - type: cosine_map@100
300
+ value: 0.7624961899058385
301
+ name: Cosine Map@100
302
+ - task:
303
+ type: information-retrieval
304
+ name: Information Retrieval
305
+ dataset:
306
+ name: dim 64
307
+ type: dim_64
308
+ metrics:
309
+ - type: cosine_accuracy@1
310
+ value: 0.6485714285714286
311
+ name: Cosine Accuracy@1
312
+ - type: cosine_accuracy@3
313
+ value: 0.7771428571428571
314
+ name: Cosine Accuracy@3
315
+ - type: cosine_accuracy@5
316
+ value: 0.8171428571428572
317
+ name: Cosine Accuracy@5
318
+ - type: cosine_accuracy@10
319
+ value: 0.87
320
+ name: Cosine Accuracy@10
321
+ - type: cosine_precision@1
322
+ value: 0.6485714285714286
323
+ name: Cosine Precision@1
324
+ - type: cosine_precision@3
325
+ value: 0.2590476190476191
326
+ name: Cosine Precision@3
327
+ - type: cosine_precision@5
328
+ value: 0.16342857142857142
329
+ name: Cosine Precision@5
330
+ - type: cosine_precision@10
331
+ value: 0.087
332
+ name: Cosine Precision@10
333
+ - type: cosine_recall@1
334
+ value: 0.6485714285714286
335
+ name: Cosine Recall@1
336
+ - type: cosine_recall@3
337
+ value: 0.7771428571428571
338
+ name: Cosine Recall@3
339
+ - type: cosine_recall@5
340
+ value: 0.8171428571428572
341
+ name: Cosine Recall@5
342
+ - type: cosine_recall@10
343
+ value: 0.87
344
+ name: Cosine Recall@10
345
+ - type: cosine_ndcg@10
346
+ value: 0.7582844308652432
347
+ name: Cosine Ndcg@10
348
+ - type: cosine_mrr@10
349
+ value: 0.7225646258503399
350
+ name: Cosine Mrr@10
351
+ - type: cosine_map@100
352
+ value: 0.7276362979042951
353
+ name: Cosine Map@100
354
+ ---
355
+
356
+ # BGE base Financial Matryoshka
357
+
358
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the [finanical-rag-embedding-dataset](https://huggingface.co/datasets/philschmid/finanical-rag-embedding-dataset) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
359
+
360
+ ## Model Details
361
+
362
+ ### Model Description
363
+ - **Model Type:** Sentence Transformer
364
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
365
+ - **Maximum Sequence Length:** 512 tokens
366
+ - **Output Dimensionality:** 768 dimensions
367
+ - **Similarity Function:** Cosine Similarity
368
+ - **Training Dataset:**
369
+ - [finanical-rag-embedding-dataset](https://huggingface.co/datasets/philschmid/finanical-rag-embedding-dataset)
370
+ - **Language:** en
371
+ - **License:** apache-2.0
372
+
373
+ ### Model Sources
374
+
375
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
376
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
377
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
378
+
379
+ ### Full Model Architecture
380
+
381
+ ```
382
+ SentenceTransformer(
383
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
384
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
385
+ (2): Normalize()
386
+ )
387
+ ```
388
+
389
+ ## Usage
390
+
391
+ ### Direct Usage (Sentence Transformers)
392
+
393
+ First install the Sentence Transformers library:
394
+
395
+ ```bash
396
+ pip install -U sentence-transformers
397
+ ```
398
+
399
+ Then you can load this model and run inference.
400
+ ```python
401
+ from sentence_transformers import SentenceTransformer
402
+
403
+ # Download from the 🤗 Hub
404
+ model = SentenceTransformer("bnkc123/bge-base-financial-matryoshka")
405
+ # Run inference
406
+ sentences = [
407
+ 'How much total cash did The Hershey Company use for share repurchases in 2023 excluding excise tax?',
408
+ 'In 2023, The Hershey Company used a total of $267.3 million in cash for share repurchases, excluding any excise tax.',
409
+ 'Operating income increased $5.8 billion, or 72.8%, in 2023 compared to 2022. The increase in operating income was primarily driven by the absence of $5.8 billion of opioid litigation charges recorded in 2022 and increases in the Pharmacy & Consumer Wellness segment, primarily driven by the absence of a $2.5 billion loss on assets held for sale recorded in 2022 related to the write-down of the Company’s Omnicare® long-term care business which was partially offset by continued pharmacy reimbursement pressure and decreased COVID-19 vaccinations and diagnostic testing compared to 2022, as well as an increase in the Health Services segment.',
410
+ ]
411
+ embeddings = model.encode(sentences)
412
+ print(embeddings.shape)
413
+ # [3, 768]
414
+
415
+ # Get the similarity scores for the embeddings
416
+ similarities = model.similarity(embeddings, embeddings)
417
+ print(similarities.shape)
418
+ # [3, 3]
419
+ ```
420
+
421
+ <!--
422
+ ### Direct Usage (Transformers)
423
+
424
+ <details><summary>Click to see the direct usage in Transformers</summary>
425
+
426
+ </details>
427
+ -->
428
+
429
+ <!--
430
+ ### Downstream Usage (Sentence Transformers)
431
+
432
+ You can finetune this model on your own dataset.
433
+
434
+ <details><summary>Click to expand</summary>
435
+
436
+ </details>
437
+ -->
438
+
439
+ <!--
440
+ ### Out-of-Scope Use
441
+
442
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
443
+ -->
444
+
445
+ ## Evaluation
446
+
447
+ ### Metrics
448
+
449
+ #### Information Retrieval
450
+
451
+ * Dataset: `dim_768`
452
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
453
+ ```json
454
+ {
455
+ "truncate_dim": 768
456
+ }
457
+ ```
458
+
459
+ | Metric | Value |
460
+ |:--------------------|:-----------|
461
+ | cosine_accuracy@1 | 0.6914 |
462
+ | cosine_accuracy@3 | 0.8257 |
463
+ | cosine_accuracy@5 | 0.8686 |
464
+ | cosine_accuracy@10 | 0.9229 |
465
+ | cosine_precision@1 | 0.6914 |
466
+ | cosine_precision@3 | 0.2752 |
467
+ | cosine_precision@5 | 0.1737 |
468
+ | cosine_precision@10 | 0.0923 |
469
+ | cosine_recall@1 | 0.6914 |
470
+ | cosine_recall@3 | 0.8257 |
471
+ | cosine_recall@5 | 0.8686 |
472
+ | cosine_recall@10 | 0.9229 |
473
+ | **cosine_ndcg@10** | **0.8071** |
474
+ | cosine_mrr@10 | 0.7702 |
475
+ | cosine_map@100 | 0.7732 |
476
+
477
+ #### Information Retrieval
478
+
479
+ * Dataset: `dim_512`
480
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
481
+ ```json
482
+ {
483
+ "truncate_dim": 512
484
+ }
485
+ ```
486
+
487
+ | Metric | Value |
488
+ |:--------------------|:-----------|
489
+ | cosine_accuracy@1 | 0.6986 |
490
+ | cosine_accuracy@3 | 0.8314 |
491
+ | cosine_accuracy@5 | 0.8686 |
492
+ | cosine_accuracy@10 | 0.9143 |
493
+ | cosine_precision@1 | 0.6986 |
494
+ | cosine_precision@3 | 0.2771 |
495
+ | cosine_precision@5 | 0.1737 |
496
+ | cosine_precision@10 | 0.0914 |
497
+ | cosine_recall@1 | 0.6986 |
498
+ | cosine_recall@3 | 0.8314 |
499
+ | cosine_recall@5 | 0.8686 |
500
+ | cosine_recall@10 | 0.9143 |
501
+ | **cosine_ndcg@10** | **0.8065** |
502
+ | cosine_mrr@10 | 0.772 |
503
+ | cosine_map@100 | 0.7755 |
504
+
505
+ #### Information Retrieval
506
+
507
+ * Dataset: `dim_256`
508
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
509
+ ```json
510
+ {
511
+ "truncate_dim": 256
512
+ }
513
+ ```
514
+
515
+ | Metric | Value |
516
+ |:--------------------|:-----------|
517
+ | cosine_accuracy@1 | 0.6843 |
518
+ | cosine_accuracy@3 | 0.8214 |
519
+ | cosine_accuracy@5 | 0.8671 |
520
+ | cosine_accuracy@10 | 0.9057 |
521
+ | cosine_precision@1 | 0.6843 |
522
+ | cosine_precision@3 | 0.2738 |
523
+ | cosine_precision@5 | 0.1734 |
524
+ | cosine_precision@10 | 0.0906 |
525
+ | cosine_recall@1 | 0.6843 |
526
+ | cosine_recall@3 | 0.8214 |
527
+ | cosine_recall@5 | 0.8671 |
528
+ | cosine_recall@10 | 0.9057 |
529
+ | **cosine_ndcg@10** | **0.7966** |
530
+ | cosine_mrr@10 | 0.7614 |
531
+ | cosine_map@100 | 0.7656 |
532
+
533
+ #### Information Retrieval
534
+
535
+ * Dataset: `dim_128`
536
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
537
+ ```json
538
+ {
539
+ "truncate_dim": 128
540
+ }
541
+ ```
542
+
543
+ | Metric | Value |
544
+ |:--------------------|:-----------|
545
+ | cosine_accuracy@1 | 0.6829 |
546
+ | cosine_accuracy@3 | 0.8157 |
547
+ | cosine_accuracy@5 | 0.8557 |
548
+ | cosine_accuracy@10 | 0.9057 |
549
+ | cosine_precision@1 | 0.6829 |
550
+ | cosine_precision@3 | 0.2719 |
551
+ | cosine_precision@5 | 0.1711 |
552
+ | cosine_precision@10 | 0.0906 |
553
+ | cosine_recall@1 | 0.6829 |
554
+ | cosine_recall@3 | 0.8157 |
555
+ | cosine_recall@5 | 0.8557 |
556
+ | cosine_recall@10 | 0.9057 |
557
+ | **cosine_ndcg@10** | **0.7943** |
558
+ | cosine_mrr@10 | 0.7587 |
559
+ | cosine_map@100 | 0.7625 |
560
+
561
+ #### Information Retrieval
562
+
563
+ * Dataset: `dim_64`
564
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
565
+ ```json
566
+ {
567
+ "truncate_dim": 64
568
+ }
569
+ ```
570
+
571
+ | Metric | Value |
572
+ |:--------------------|:-----------|
573
+ | cosine_accuracy@1 | 0.6486 |
574
+ | cosine_accuracy@3 | 0.7771 |
575
+ | cosine_accuracy@5 | 0.8171 |
576
+ | cosine_accuracy@10 | 0.87 |
577
+ | cosine_precision@1 | 0.6486 |
578
+ | cosine_precision@3 | 0.259 |
579
+ | cosine_precision@5 | 0.1634 |
580
+ | cosine_precision@10 | 0.087 |
581
+ | cosine_recall@1 | 0.6486 |
582
+ | cosine_recall@3 | 0.7771 |
583
+ | cosine_recall@5 | 0.8171 |
584
+ | cosine_recall@10 | 0.87 |
585
+ | **cosine_ndcg@10** | **0.7583** |
586
+ | cosine_mrr@10 | 0.7226 |
587
+ | cosine_map@100 | 0.7276 |
588
+
589
+ <!--
590
+ ## Bias, Risks and Limitations
591
+
592
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
593
+ -->
594
+
595
+ <!--
596
+ ### Recommendations
597
+
598
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
599
+ -->
600
+
601
+ ## Training Details
602
+
603
+ ### Training Dataset
604
+
605
+ #### finanical-rag-embedding-dataset
606
+
607
+ * Dataset: [finanical-rag-embedding-dataset](https://huggingface.co/datasets/philschmid/finanical-rag-embedding-dataset) at [e0b1781](https://huggingface.co/datasets/philschmid/finanical-rag-embedding-dataset/tree/e0b17819cf52d444066c99f4a176f5717e066300)
608
+ * Size: 6,300 training samples
609
+ * Columns: <code>anchor</code> and <code>positive</code>
610
+ * Approximate statistics based on the first 1000 samples:
611
+ | | anchor | positive |
612
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
613
+ | type | string | string |
614
+ | details | <ul><li>min: 9 tokens</li><li>mean: 20.65 tokens</li><li>max: 51 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 45.4 tokens</li><li>max: 512 tokens</li></ul> |
615
+ * Samples:
616
+ | anchor | positive |
617
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
618
+ | <code>How much cash did FedEx have at the end of May 2023?</code> | <code>FedEx reported having $6.9 billion in cash and cash equivalents at the end of May 2023.</code> |
619
+ | <code>What were Caterpillar's total obligations for the purchase of goods and services as of December 31, 2023?</code> | <code>We have short-term obligations related to the purchase of goods and services made in the ordinary course of business. These consist of invoices received and recorded as liabilities as of December 31, 2023, but scheduled for payment in 2024 of $7.91 billion.</code> |
620
+ | <code>What was the total number of outstanding stock option awards at the beginning and end of 2023, and what were their weighted average exercise prices?</code> | <code>Stock option activity under the Plan for the years ended reveals that stock options both started and ended with 6.2 million outstanding in 2023. The weighted average exercise price at the beginning of the year was $50.40 and $50.42 at the end.</code> |
621
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
622
+ ```json
623
+ {
624
+ "loss": "MultipleNegativesRankingLoss",
625
+ "matryoshka_dims": [
626
+ 768,
627
+ 512,
628
+ 256,
629
+ 128,
630
+ 64
631
+ ],
632
+ "matryoshka_weights": [
633
+ 1,
634
+ 1,
635
+ 1,
636
+ 1,
637
+ 1
638
+ ],
639
+ "n_dims_per_step": -1
640
+ }
641
+ ```
642
+
643
+ ### Training Hyperparameters
644
+ #### Non-Default Hyperparameters
645
+
646
+ - `eval_strategy`: epoch
647
+ - `per_device_train_batch_size`: 32
648
+ - `per_device_eval_batch_size`: 16
649
+ - `gradient_accumulation_steps`: 16
650
+ - `learning_rate`: 2e-05
651
+ - `num_train_epochs`: 4
652
+ - `lr_scheduler_type`: cosine
653
+ - `warmup_ratio`: 0.1
654
+ - `bf16`: True
655
+ - `tf32`: True
656
+ - `load_best_model_at_end`: True
657
+ - `optim`: adamw_torch_fused
658
+ - `push_to_hub`: True
659
+ - `hub_model_id`: bnkc123/bge-base-financial-matryoshka
660
+ - `batch_sampler`: no_duplicates
661
+
662
+ #### All Hyperparameters
663
+ <details><summary>Click to expand</summary>
664
+
665
+ - `overwrite_output_dir`: False
666
+ - `do_predict`: False
667
+ - `eval_strategy`: epoch
668
+ - `prediction_loss_only`: True
669
+ - `per_device_train_batch_size`: 32
670
+ - `per_device_eval_batch_size`: 16
671
+ - `per_gpu_train_batch_size`: None
672
+ - `per_gpu_eval_batch_size`: None
673
+ - `gradient_accumulation_steps`: 16
674
+ - `eval_accumulation_steps`: None
675
+ - `torch_empty_cache_steps`: None
676
+ - `learning_rate`: 2e-05
677
+ - `weight_decay`: 0.0
678
+ - `adam_beta1`: 0.9
679
+ - `adam_beta2`: 0.999
680
+ - `adam_epsilon`: 1e-08
681
+ - `max_grad_norm`: 1.0
682
+ - `num_train_epochs`: 4
683
+ - `max_steps`: -1
684
+ - `lr_scheduler_type`: cosine
685
+ - `lr_scheduler_kwargs`: {}
686
+ - `warmup_ratio`: 0.1
687
+ - `warmup_steps`: 0
688
+ - `log_level`: passive
689
+ - `log_level_replica`: warning
690
+ - `log_on_each_node`: True
691
+ - `logging_nan_inf_filter`: True
692
+ - `save_safetensors`: True
693
+ - `save_on_each_node`: False
694
+ - `save_only_model`: False
695
+ - `restore_callback_states_from_checkpoint`: False
696
+ - `no_cuda`: False
697
+ - `use_cpu`: False
698
+ - `use_mps_device`: False
699
+ - `seed`: 42
700
+ - `data_seed`: None
701
+ - `jit_mode_eval`: False
702
+ - `use_ipex`: False
703
+ - `bf16`: True
704
+ - `fp16`: False
705
+ - `fp16_opt_level`: O1
706
+ - `half_precision_backend`: auto
707
+ - `bf16_full_eval`: False
708
+ - `fp16_full_eval`: False
709
+ - `tf32`: True
710
+ - `local_rank`: 0
711
+ - `ddp_backend`: None
712
+ - `tpu_num_cores`: None
713
+ - `tpu_metrics_debug`: False
714
+ - `debug`: []
715
+ - `dataloader_drop_last`: False
716
+ - `dataloader_num_workers`: 0
717
+ - `dataloader_prefetch_factor`: None
718
+ - `past_index`: -1
719
+ - `disable_tqdm`: False
720
+ - `remove_unused_columns`: True
721
+ - `label_names`: None
722
+ - `load_best_model_at_end`: True
723
+ - `ignore_data_skip`: False
724
+ - `fsdp`: []
725
+ - `fsdp_min_num_params`: 0
726
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
727
+ - `tp_size`: 0
728
+ - `fsdp_transformer_layer_cls_to_wrap`: None
729
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
730
+ - `deepspeed`: None
731
+ - `label_smoothing_factor`: 0.0
732
+ - `optim`: adamw_torch_fused
733
+ - `optim_args`: None
734
+ - `adafactor`: False
735
+ - `group_by_length`: False
736
+ - `length_column_name`: length
737
+ - `ddp_find_unused_parameters`: None
738
+ - `ddp_bucket_cap_mb`: None
739
+ - `ddp_broadcast_buffers`: False
740
+ - `dataloader_pin_memory`: True
741
+ - `dataloader_persistent_workers`: False
742
+ - `skip_memory_metrics`: True
743
+ - `use_legacy_prediction_loop`: False
744
+ - `push_to_hub`: True
745
+ - `resume_from_checkpoint`: None
746
+ - `hub_model_id`: bnkc123/bge-base-financial-matryoshka
747
+ - `hub_strategy`: every_save
748
+ - `hub_private_repo`: None
749
+ - `hub_always_push`: False
750
+ - `gradient_checkpointing`: False
751
+ - `gradient_checkpointing_kwargs`: None
752
+ - `include_inputs_for_metrics`: False
753
+ - `include_for_metrics`: []
754
+ - `eval_do_concat_batches`: True
755
+ - `fp16_backend`: auto
756
+ - `push_to_hub_model_id`: None
757
+ - `push_to_hub_organization`: None
758
+ - `mp_parameters`:
759
+ - `auto_find_batch_size`: False
760
+ - `full_determinism`: False
761
+ - `torchdynamo`: None
762
+ - `ray_scope`: last
763
+ - `ddp_timeout`: 1800
764
+ - `torch_compile`: False
765
+ - `torch_compile_backend`: None
766
+ - `torch_compile_mode`: None
767
+ - `include_tokens_per_second`: False
768
+ - `include_num_input_tokens_seen`: False
769
+ - `neftune_noise_alpha`: None
770
+ - `optim_target_modules`: None
771
+ - `batch_eval_metrics`: False
772
+ - `eval_on_start`: False
773
+ - `use_liger_kernel`: False
774
+ - `eval_use_gather_object`: False
775
+ - `average_tokens_across_devices`: False
776
+ - `prompts`: None
777
+ - `batch_sampler`: no_duplicates
778
+ - `multi_dataset_batch_sampler`: proportional
779
+
780
+ </details>
781
+
782
+ ### Training Logs
783
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
784
+ |:---------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
785
+ | 0.8122 | 10 | 25.3869 | - | - | - | - | - |
786
+ | 1.0 | 13 | - | 0.7943 | 0.7907 | 0.7884 | 0.7756 | 0.7419 |
787
+ | 1.5685 | 20 | 9.8731 | - | - | - | - | - |
788
+ | 2.0 | 26 | - | 0.8040 | 0.8032 | 0.7939 | 0.7906 | 0.7553 |
789
+ | 2.3249 | 30 | 7.6627 | - | - | - | - | - |
790
+ | 3.0 | 39 | - | 0.8067 | 0.8054 | 0.7989 | 0.7930 | 0.7584 |
791
+ | 3.0812 | 40 | 6.5397 | - | - | - | - | - |
792
+ | **3.731** | **48** | **-** | **0.8071** | **0.8065** | **0.7966** | **0.7943** | **0.7583** |
793
+
794
+ * The bold row denotes the saved checkpoint.
795
+
796
+ ### Framework Versions
797
+ - Python: 3.12.6
798
+ - Sentence Transformers: 4.1.0
799
+ - Transformers: 4.51.3
800
+ - PyTorch: 2.7.0+cu126
801
+ - Accelerate: 1.6.0
802
+ - Datasets: 3.5.1
803
+ - Tokenizers: 0.21.1
804
+
805
+ ## Citation
806
+
807
+ ### BibTeX
808
+
809
+ #### Sentence Transformers
810
+ ```bibtex
811
+ @inproceedings{reimers-2019-sentence-bert,
812
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
813
+ author = "Reimers, Nils and Gurevych, Iryna",
814
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
815
+ month = "11",
816
+ year = "2019",
817
+ publisher = "Association for Computational Linguistics",
818
+ url = "https://arxiv.org/abs/1908.10084",
819
+ }
820
+ ```
821
+
822
+ #### MatryoshkaLoss
823
+ ```bibtex
824
+ @misc{kusupati2024matryoshka,
825
+ title={Matryoshka Representation Learning},
826
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
827
+ year={2024},
828
+ eprint={2205.13147},
829
+ archivePrefix={arXiv},
830
+ primaryClass={cs.LG}
831
+ }
832
+ ```
833
+
834
+ #### MultipleNegativesRankingLoss
835
+ ```bibtex
836
+ @misc{henderson2017efficient,
837
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
838
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
839
+ year={2017},
840
+ eprint={1705.00652},
841
+ archivePrefix={arXiv},
842
+ primaryClass={cs.CL}
843
+ }
844
+ ```
845
+
846
+ <!--
847
+ ## Glossary
848
+
849
+ *Clearly define terms in order to be accessible across audiences.*
850
+ -->
851
+
852
+ <!--
853
+ ## Model Card Authors
854
+
855
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
856
+ -->
857
+
858
+ <!--
859
+ ## Model Card Contact
860
+
861
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
862
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.7.0+cu126"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }