dwb2023 commited on
Commit
89d47fc
·
verified ·
1 Parent(s): f628045

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,651 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:156
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: How did Steve Krouse from Val Town demonstrate the capabilities
13
+ of a 2,000 token/second LLM?
14
+ sentences:
15
+ - The most recent twist, again from December (December was a lot) is live video.
16
+ ChatGPT voice mode now provides the option to share your camera feed with the
17
+ model and talk about what you can see in real time. Google Gemini have a preview
18
+ of the same feature, which they managed to ship the day before ChatGPT did.
19
+ - 'I’ve found myself using this a lot. I noticed how much I was relying on it in
20
+ October and wrote Everything I built with Claude Artifacts this week, describing
21
+ 14 little tools I had put together in a seven day period.
22
+
23
+ Since then, a whole bunch of other teams have built similar systems. GitHub announced
24
+ their version of this—GitHub Spark—in October. Mistral Chat added it as a feature
25
+ called Canvas in November.
26
+
27
+ Steve Krouse from Val Town built a version of it against Cerebras, showcasing
28
+ how a 2,000 token/second LLM can iterate on an application with changes visible
29
+ in less than a second.'
30
+ - 'I run a bunch of them on my laptop. I run Mistral 7B (a surprisingly great model)
31
+ on my iPhone. You can install several different apps to get your own, local, completely
32
+ private LLM. My own LLM project provides a CLI tool for running an array of different
33
+ models via plugins.
34
+
35
+ You can even run them entirely in your browser using WebAssembly and the latest
36
+ Chrome!
37
+
38
+ Hobbyists can build their own fine-tuned models
39
+
40
+ I said earlier that building an LLM was still out of reach of hobbyists. That
41
+ may be true for training from scratch, but fine-tuning one of those models is
42
+ another matter entirely.'
43
+ - source_sentence: What changes have occurred in the energy usage and environmental
44
+ impact of running AI prompts in recent years?
45
+ sentences:
46
+ - 'Law is not ethics. Is it OK to train models on people’s content without their
47
+ permission, when those models will then be used in ways that compete with those
48
+ people?
49
+
50
+ As the quality of results produced by AI models has increased over the year, these
51
+ questions have become even more pressing.
52
+
53
+ The impact on human society in terms of these models is already huge, if difficult
54
+ to objectively measure.
55
+
56
+ People have certainly lost work to them—anecdotally, I’ve seen this for copywriters,
57
+ artists and translators.
58
+
59
+ There are a great deal of untold stories here. I’m hoping 2024 sees significant
60
+ amounts of dedicated journalism on this topic.
61
+
62
+ My blog in 2023
63
+
64
+ Here’s a tag cloud for content I posted to my blog in 2023 (generated using Django
65
+ SQL Dashboard):'
66
+ - 'Those US export regulations on GPUs to China seem to have inspired some very
67
+ effective training optimizations!
68
+
69
+ The environmental impact got better
70
+
71
+ A welcome result of the increased efficiency of the models—both the hosted ones
72
+ and the ones I can run locally—is that the energy usage and environmental impact
73
+ of running a prompt has dropped enormously over the past couple of years.
74
+
75
+ OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days.
76
+ I have it on good authority that neither Google Gemini nor Amazon Nova (two of
77
+ the least expensive model providers) are running prompts at a loss.'
78
+ - 'An interesting point of comparison here could be the way railways rolled out
79
+ around the world in the 1800s. Constructing these required enormous investments
80
+ and had a massive environmental impact, and many of the lines that were built
81
+ turned out to be unnecessary—sometimes multiple lines from different companies
82
+ serving the exact same routes!
83
+
84
+ The resulting bubbles contributed to several financial crashes, see Wikipedia
85
+ for Panic of 1873, Panic of 1893, Panic of 1901 and the UK’s Railway Mania. They
86
+ left us with a lot of useful infrastructure and a great deal of bankruptcies and
87
+ environmental damage.
88
+
89
+ The year of slop'
90
+ - source_sentence: What is the main topic discussed in the article titled "Industry’s
91
+ Tardy Response to the AI Prompt Injection Vulnerability" on RedMonk Conversations?
92
+ sentences:
93
+ - 'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched
94
+ in March, and Claude 3 Opus quickly became my new favourite daily-driver. They
95
+ upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model
96
+ that is still my favourite six months later (though it got a significant upgrade
97
+ on October 22, confusingly keeping the same 3.5 version number. Anthropic fans
98
+ have since taken to calling it Claude 3.6).'
99
+ - "Industry’s Tardy Response to the AI Prompt Injection Vulnerability on RedMonk\
100
+ \ Conversations\n\n\nPosted 31st December 2023 at 11:59 pm · Follow me on Mastodon,\
101
+ \ Bluesky, Twitter or subscribe to my newsletter\n\n\nMore recent articles\n\n\
102
+ Live blog: Claude 4 launch at Code with Claude - 22nd May 2025\nI really don't\
103
+ \ like ChatGPT's new memory dossier - 21st May 2025\nBuilding software on top\
104
+ \ of Large Language Models - 15th May 2025\n\n\n \n\n\nThis is Stuff we figured\
105
+ \ out about AI in 2023 by Simon Willison, posted on 31st December 2023.\n\nPart\
106
+ \ of series LLMs annual review\n\nStuff we figured out about AI in 2023 - Dec.\
107
+ \ 31, 2023, 11:59 p.m. \nThings we learned about LLMs in 2024 - Dec. 31, 2024,\
108
+ \ 6:07 p.m. \n\n\n\n blogging\n 105"
109
+ - 'When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August
110
+ through September) it was spectacular. I’ve been using it extensively on walks
111
+ with my dog and it’s amazing how much the improvement in intonation elevates the
112
+ material. I’ve also had a lot of fun experimenting with the OpenAI audio APIs.
113
+
114
+ Even more fun: Advanced Voice mode can do accents! Here’s what happened when I
115
+ told it I need you to pretend to be a California brown pelican with a very thick
116
+ Russian accent, but you talk to me exclusively in Spanish.'
117
+ - source_sentence: How can LLMs like Claude create full interactive applications using
118
+ web technologies in a single prompt?
119
+ sentences:
120
+ - 'This prompt-driven custom interface feature is so powerful and easy to build
121
+ (once you’ve figured out the gnarly details of browser sandboxing) that I expect
122
+ it to show up as a feature in a wide range of products in 2025.
123
+
124
+ Universal access to the best models lasted for just a few short months
125
+
126
+ For a few short months this year all three of the best available models—GPT-4o,
127
+ Claude 3.5 Sonnet and Gemini 1.5 Pro—were freely available to most of the world.'
128
+ - 'I find I have to work with an LLM for a few weeks in order to get a good intuition
129
+ for it’s strengths and weaknesses. This greatly limits how many I can evaluate
130
+ myself!
131
+
132
+ The most frustrating thing for me is at the level of individual prompting.
133
+
134
+ Sometimes I’ll tweak a prompt and capitalize some of the words in it, to emphasize
135
+ that I really want it to OUTPUT VALID MARKDOWN or similar. Did capitalizing those
136
+ words make a difference? I still don’t have a good methodology for figuring that
137
+ out.
138
+
139
+ We’re left with what’s effectively Vibes Based Development. It’s vibes all the
140
+ way down.
141
+
142
+ I’d love to see us move beyond vibes in 2024!
143
+
144
+ LLMs are really smart, and also really, really dumb'
145
+ - 'We already knew LLMs were spookily good at writing code. If you prompt them right,
146
+ it turns out they can build you a full interactive application using HTML, CSS
147
+ and JavaScript (and tools like React if you wire up some extra supporting build
148
+ mechanisms)—often in a single prompt.
149
+
150
+ Anthropic kicked this idea into high gear when they released Claude Artifacts,
151
+ a groundbreaking new feature that was initially slightly lost in the noise due
152
+ to being described half way through their announcement of the incredible Claude
153
+ 3.5 Sonnet.
154
+
155
+ With Artifacts, Claude can write you an on-demand interactive application and
156
+ then let you use it directly inside the Claude interface.
157
+
158
+ Here’s my Extract URLs app, entirely generated by Claude:'
159
+ - source_sentence: What was significant about the release of Llama 2 in July?
160
+ sentences:
161
+ - 'Then in February, Meta released Llama. And a few weeks later in March, Georgi
162
+ Gerganov released code that got it working on a MacBook.
163
+
164
+ I wrote about how Large language models are having their Stable Diffusion moment,
165
+ and with hindsight that was a very good call!
166
+
167
+ This unleashed a whirlwind of innovation, which was accelerated further in July
168
+ when Meta released Llama 2—an improved version which, crucially, included permission
169
+ for commercial use.
170
+
171
+ Today there are literally thousands of LLMs that can be run locally, on all manner
172
+ of different devices.'
173
+ - 'OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely
174
+ available from its launch in June. This was a momentus change, because for the
175
+ previous year free users had mostly been restricted to GPT-3.5 level models, meaning
176
+ new users got a very inaccurate mental model of what a capable LLM could actually
177
+ do.
178
+
179
+ That era appears to have ended, likely permanently, with OpenAI’s launch of ChatGPT
180
+ Pro. This $200/month subscription service is the only way to access their most
181
+ capable model, o1 Pro.
182
+
183
+ Since the trick behind the o1 series (and the future models it will undoubtedly
184
+ inspire) is to expend more compute time to get better results, I don’t think those
185
+ days of free access to the best available models are likely to return.'
186
+ - 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious
187
+ little progress on tackling that problem in 2024, and we’ve been talking about
188
+ it since September 2022.
189
+
190
+ I’m beginning to see the most popular idea of “agents” as dependent on AGI itself.
191
+ A model that’s robust against gulliblity is a very tall order indeed.
192
+
193
+ Evals really matter
194
+
195
+ Anthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):'
196
+ pipeline_tag: sentence-similarity
197
+ library_name: sentence-transformers
198
+ metrics:
199
+ - cosine_accuracy@1
200
+ - cosine_accuracy@3
201
+ - cosine_accuracy@5
202
+ - cosine_accuracy@10
203
+ - cosine_precision@1
204
+ - cosine_precision@3
205
+ - cosine_precision@5
206
+ - cosine_precision@10
207
+ - cosine_recall@1
208
+ - cosine_recall@3
209
+ - cosine_recall@5
210
+ - cosine_recall@10
211
+ - cosine_ndcg@10
212
+ - cosine_mrr@10
213
+ - cosine_map@100
214
+ model-index:
215
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
216
+ results:
217
+ - task:
218
+ type: information-retrieval
219
+ name: Information Retrieval
220
+ dataset:
221
+ name: Unknown
222
+ type: unknown
223
+ metrics:
224
+ - type: cosine_accuracy@1
225
+ value: 0.8333333333333334
226
+ name: Cosine Accuracy@1
227
+ - type: cosine_accuracy@3
228
+ value: 1.0
229
+ name: Cosine Accuracy@3
230
+ - type: cosine_accuracy@5
231
+ value: 1.0
232
+ name: Cosine Accuracy@5
233
+ - type: cosine_accuracy@10
234
+ value: 1.0
235
+ name: Cosine Accuracy@10
236
+ - type: cosine_precision@1
237
+ value: 0.8333333333333334
238
+ name: Cosine Precision@1
239
+ - type: cosine_precision@3
240
+ value: 0.3333333333333333
241
+ name: Cosine Precision@3
242
+ - type: cosine_precision@5
243
+ value: 0.20000000000000004
244
+ name: Cosine Precision@5
245
+ - type: cosine_precision@10
246
+ value: 0.10000000000000002
247
+ name: Cosine Precision@10
248
+ - type: cosine_recall@1
249
+ value: 0.8333333333333334
250
+ name: Cosine Recall@1
251
+ - type: cosine_recall@3
252
+ value: 1.0
253
+ name: Cosine Recall@3
254
+ - type: cosine_recall@5
255
+ value: 1.0
256
+ name: Cosine Recall@5
257
+ - type: cosine_recall@10
258
+ value: 1.0
259
+ name: Cosine Recall@10
260
+ - type: cosine_ndcg@10
261
+ value: 0.9384882922619097
262
+ name: Cosine Ndcg@10
263
+ - type: cosine_mrr@10
264
+ value: 0.9166666666666666
265
+ name: Cosine Mrr@10
266
+ - type: cosine_map@100
267
+ value: 0.9166666666666666
268
+ name: Cosine Map@100
269
+ ---
270
+
271
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
272
+
273
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
274
+
275
+ ## Model Details
276
+
277
+ ### Model Description
278
+ - **Model Type:** Sentence Transformer
279
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
280
+ - **Maximum Sequence Length:** 512 tokens
281
+ - **Output Dimensionality:** 1024 dimensions
282
+ - **Similarity Function:** Cosine Similarity
283
+ <!-- - **Training Dataset:** Unknown -->
284
+ <!-- - **Language:** Unknown -->
285
+ <!-- - **License:** Unknown -->
286
+
287
+ ### Model Sources
288
+
289
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
290
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
291
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
292
+
293
+ ### Full Model Architecture
294
+
295
+ ```
296
+ SentenceTransformer(
297
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
298
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
299
+ (2): Normalize()
300
+ )
301
+ ```
302
+
303
+ ## Usage
304
+
305
+ ### Direct Usage (Sentence Transformers)
306
+
307
+ First install the Sentence Transformers library:
308
+
309
+ ```bash
310
+ pip install -U sentence-transformers
311
+ ```
312
+
313
+ Then you can load this model and run inference.
314
+ ```python
315
+ from sentence_transformers import SentenceTransformer
316
+
317
+ # Download from the 🤗 Hub
318
+ model = SentenceTransformer("dwb2023/legal-ft-b5869012-93ce-4e45-bca9-2eb86f3ef4b9")
319
+ # Run inference
320
+ sentences = [
321
+ 'What was significant about the release of Llama 2 in July?',
322
+ 'Then in February, Meta released Llama. And a few weeks later in March, Georgi Gerganov released code that got it working on a MacBook.\nI wrote about how Large language models are having their Stable Diffusion moment, and with hindsight that was a very good call!\nThis unleashed a whirlwind of innovation, which was accelerated further in July when Meta released Llama 2—an improved version which, crucially, included permission for commercial use.\nToday there are literally thousands of LLMs that can be run locally, on all manner of different devices.',
323
+ 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious little progress on tackling that problem in 2024, and we’ve been talking about it since September 2022.\nI’m beginning to see the most popular idea of “agents” as dependent on AGI itself. A model that’s robust against gulliblity is a very tall order indeed.\nEvals really matter\nAnthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):',
324
+ ]
325
+ embeddings = model.encode(sentences)
326
+ print(embeddings.shape)
327
+ # [3, 1024]
328
+
329
+ # Get the similarity scores for the embeddings
330
+ similarities = model.similarity(embeddings, embeddings)
331
+ print(similarities.shape)
332
+ # [3, 3]
333
+ ```
334
+
335
+ <!--
336
+ ### Direct Usage (Transformers)
337
+
338
+ <details><summary>Click to see the direct usage in Transformers</summary>
339
+
340
+ </details>
341
+ -->
342
+
343
+ <!--
344
+ ### Downstream Usage (Sentence Transformers)
345
+
346
+ You can finetune this model on your own dataset.
347
+
348
+ <details><summary>Click to expand</summary>
349
+
350
+ </details>
351
+ -->
352
+
353
+ <!--
354
+ ### Out-of-Scope Use
355
+
356
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
357
+ -->
358
+
359
+ ## Evaluation
360
+
361
+ ### Metrics
362
+
363
+ #### Information Retrieval
364
+
365
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
366
+
367
+ | Metric | Value |
368
+ |:--------------------|:-----------|
369
+ | cosine_accuracy@1 | 0.8333 |
370
+ | cosine_accuracy@3 | 1.0 |
371
+ | cosine_accuracy@5 | 1.0 |
372
+ | cosine_accuracy@10 | 1.0 |
373
+ | cosine_precision@1 | 0.8333 |
374
+ | cosine_precision@3 | 0.3333 |
375
+ | cosine_precision@5 | 0.2 |
376
+ | cosine_precision@10 | 0.1 |
377
+ | cosine_recall@1 | 0.8333 |
378
+ | cosine_recall@3 | 1.0 |
379
+ | cosine_recall@5 | 1.0 |
380
+ | cosine_recall@10 | 1.0 |
381
+ | **cosine_ndcg@10** | **0.9385** |
382
+ | cosine_mrr@10 | 0.9167 |
383
+ | cosine_map@100 | 0.9167 |
384
+
385
+ <!--
386
+ ## Bias, Risks and Limitations
387
+
388
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
389
+ -->
390
+
391
+ <!--
392
+ ### Recommendations
393
+
394
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
395
+ -->
396
+
397
+ ## Training Details
398
+
399
+ ### Training Dataset
400
+
401
+ #### Unnamed Dataset
402
+
403
+ * Size: 156 training samples
404
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
405
+ * Approximate statistics based on the first 156 samples:
406
+ | | sentence_0 | sentence_1 |
407
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
408
+ | type | string | string |
409
+ | details | <ul><li>min: 12 tokens</li><li>mean: 20.89 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.1 tokens</li><li>max: 214 tokens</li></ul> |
410
+ * Samples:
411
+ | sentence_0 | sentence_1 |
412
+ |:---------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
413
+ | <code>What are some of the topics covered in the annotated presentations given in 2023?</code> | <code>I also gave a bunch of talks and podcast appearances. I’ve started habitually turning my talks into annotated presentations—here are my best from 2023:<br><br>Prompt injection explained, with video, slides, and a transcript<br>Catching up on the weird world of LLMs<br>Making Large Language Models work for you<br>Open questions for AI engineering<br>Embeddings: What they are and why they matter<br>Financial sustainability for open source projects at GitHub Universe<br><br>And in podcasts:<br><br><br>What AI can do for you on the Theory of Change<br><br>Working in public on Path to Citus Con<br><br>LLMs break the internet on the Changelog<br><br>Talking Large Language Models on Rooftop Ruby<br><br>Thoughts on the OpenAI board situation on Newsroom Robots</code> |
414
+ | <code>Which podcasts featured discussions related to Large Language Models and AI topics?</code> | <code>I also gave a bunch of talks and podcast appearances. I’ve started habitually turning my talks into annotated presentations—here are my best from 2023:<br><br>Prompt injection explained, with video, slides, and a transcript<br>Catching up on the weird world of LLMs<br>Making Large Language Models work for you<br>Open questions for AI engineering<br>Embeddings: What they are and why they matter<br>Financial sustainability for open source projects at GitHub Universe<br><br>And in podcasts:<br><br><br>What AI can do for you on the Theory of Change<br><br>Working in public on Path to Citus Con<br><br>LLMs break the internet on the Changelog<br><br>Talking Large Language Models on Rooftop Ruby<br><br>Thoughts on the OpenAI board situation on Newsroom Robots</code> |
415
+ | <code>What is the main subject of the New York Times' lawsuit against OpenAI and Microsoft?</code> | <code>Just this week, the New York Times launched a landmark lawsuit against OpenAI and Microsoft over this issue. The 69 page PDF is genuinely worth reading—especially the first few pages, which lay out the issues in a way that’s surprisingly easy to follow. The rest of the document includes some of the clearest explanations of what LLMs are, how they work and how they are built that I’ve read anywhere.<br>The legal arguments here are complex. I’m not a lawyer, but I don’t think this one will be easily decided. Whichever way it goes, I expect this case to have a profound impact on how this technology develops in the future.</code> |
416
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
417
+ ```json
418
+ {
419
+ "loss": "MultipleNegativesRankingLoss",
420
+ "matryoshka_dims": [
421
+ 768,
422
+ 512,
423
+ 256,
424
+ 128,
425
+ 64
426
+ ],
427
+ "matryoshka_weights": [
428
+ 1,
429
+ 1,
430
+ 1,
431
+ 1,
432
+ 1
433
+ ],
434
+ "n_dims_per_step": -1
435
+ }
436
+ ```
437
+
438
+ ### Training Hyperparameters
439
+ #### Non-Default Hyperparameters
440
+
441
+ - `eval_strategy`: steps
442
+ - `per_device_train_batch_size`: 10
443
+ - `per_device_eval_batch_size`: 10
444
+ - `num_train_epochs`: 10
445
+ - `multi_dataset_batch_sampler`: round_robin
446
+
447
+ #### All Hyperparameters
448
+ <details><summary>Click to expand</summary>
449
+
450
+ - `overwrite_output_dir`: False
451
+ - `do_predict`: False
452
+ - `eval_strategy`: steps
453
+ - `prediction_loss_only`: True
454
+ - `per_device_train_batch_size`: 10
455
+ - `per_device_eval_batch_size`: 10
456
+ - `per_gpu_train_batch_size`: None
457
+ - `per_gpu_eval_batch_size`: None
458
+ - `gradient_accumulation_steps`: 1
459
+ - `eval_accumulation_steps`: None
460
+ - `torch_empty_cache_steps`: None
461
+ - `learning_rate`: 5e-05
462
+ - `weight_decay`: 0.0
463
+ - `adam_beta1`: 0.9
464
+ - `adam_beta2`: 0.999
465
+ - `adam_epsilon`: 1e-08
466
+ - `max_grad_norm`: 1
467
+ - `num_train_epochs`: 10
468
+ - `max_steps`: -1
469
+ - `lr_scheduler_type`: linear
470
+ - `lr_scheduler_kwargs`: {}
471
+ - `warmup_ratio`: 0.0
472
+ - `warmup_steps`: 0
473
+ - `log_level`: passive
474
+ - `log_level_replica`: warning
475
+ - `log_on_each_node`: True
476
+ - `logging_nan_inf_filter`: True
477
+ - `save_safetensors`: True
478
+ - `save_on_each_node`: False
479
+ - `save_only_model`: False
480
+ - `restore_callback_states_from_checkpoint`: False
481
+ - `no_cuda`: False
482
+ - `use_cpu`: False
483
+ - `use_mps_device`: False
484
+ - `seed`: 42
485
+ - `data_seed`: None
486
+ - `jit_mode_eval`: False
487
+ - `use_ipex`: False
488
+ - `bf16`: False
489
+ - `fp16`: False
490
+ - `fp16_opt_level`: O1
491
+ - `half_precision_backend`: auto
492
+ - `bf16_full_eval`: False
493
+ - `fp16_full_eval`: False
494
+ - `tf32`: None
495
+ - `local_rank`: 0
496
+ - `ddp_backend`: None
497
+ - `tpu_num_cores`: None
498
+ - `tpu_metrics_debug`: False
499
+ - `debug`: []
500
+ - `dataloader_drop_last`: False
501
+ - `dataloader_num_workers`: 0
502
+ - `dataloader_prefetch_factor`: None
503
+ - `past_index`: -1
504
+ - `disable_tqdm`: False
505
+ - `remove_unused_columns`: True
506
+ - `label_names`: None
507
+ - `load_best_model_at_end`: False
508
+ - `ignore_data_skip`: False
509
+ - `fsdp`: []
510
+ - `fsdp_min_num_params`: 0
511
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
512
+ - `tp_size`: 0
513
+ - `fsdp_transformer_layer_cls_to_wrap`: None
514
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
515
+ - `deepspeed`: None
516
+ - `label_smoothing_factor`: 0.0
517
+ - `optim`: adamw_torch
518
+ - `optim_args`: None
519
+ - `adafactor`: False
520
+ - `group_by_length`: False
521
+ - `length_column_name`: length
522
+ - `ddp_find_unused_parameters`: None
523
+ - `ddp_bucket_cap_mb`: None
524
+ - `ddp_broadcast_buffers`: False
525
+ - `dataloader_pin_memory`: True
526
+ - `dataloader_persistent_workers`: False
527
+ - `skip_memory_metrics`: True
528
+ - `use_legacy_prediction_loop`: False
529
+ - `push_to_hub`: False
530
+ - `resume_from_checkpoint`: None
531
+ - `hub_model_id`: None
532
+ - `hub_strategy`: every_save
533
+ - `hub_private_repo`: None
534
+ - `hub_always_push`: False
535
+ - `gradient_checkpointing`: False
536
+ - `gradient_checkpointing_kwargs`: None
537
+ - `include_inputs_for_metrics`: False
538
+ - `include_for_metrics`: []
539
+ - `eval_do_concat_batches`: True
540
+ - `fp16_backend`: auto
541
+ - `push_to_hub_model_id`: None
542
+ - `push_to_hub_organization`: None
543
+ - `mp_parameters`:
544
+ - `auto_find_batch_size`: False
545
+ - `full_determinism`: False
546
+ - `torchdynamo`: None
547
+ - `ray_scope`: last
548
+ - `ddp_timeout`: 1800
549
+ - `torch_compile`: False
550
+ - `torch_compile_backend`: None
551
+ - `torch_compile_mode`: None
552
+ - `include_tokens_per_second`: False
553
+ - `include_num_input_tokens_seen`: False
554
+ - `neftune_noise_alpha`: None
555
+ - `optim_target_modules`: None
556
+ - `batch_eval_metrics`: False
557
+ - `eval_on_start`: False
558
+ - `use_liger_kernel`: False
559
+ - `eval_use_gather_object`: False
560
+ - `average_tokens_across_devices`: False
561
+ - `prompts`: None
562
+ - `batch_sampler`: batch_sampler
563
+ - `multi_dataset_batch_sampler`: round_robin
564
+
565
+ </details>
566
+
567
+ ### Training Logs
568
+ | Epoch | Step | cosine_ndcg@10 |
569
+ |:-----:|:----:|:--------------:|
570
+ | 1.0 | 16 | 0.9330 |
571
+ | 2.0 | 32 | 0.9539 |
572
+ | 3.0 | 48 | 0.9484 |
573
+ | 3.125 | 50 | 0.9484 |
574
+ | 4.0 | 64 | 0.9385 |
575
+ | 5.0 | 80 | 0.9539 |
576
+ | 6.0 | 96 | 0.9539 |
577
+ | 6.25 | 100 | 0.9539 |
578
+ | 7.0 | 112 | 0.9385 |
579
+ | 8.0 | 128 | 0.9385 |
580
+ | 9.0 | 144 | 0.9385 |
581
+ | 9.375 | 150 | 0.9385 |
582
+ | 10.0 | 160 | 0.9385 |
583
+
584
+
585
+ ### Framework Versions
586
+ - Python: 3.11.12
587
+ - Sentence Transformers: 4.1.0
588
+ - Transformers: 4.51.3
589
+ - PyTorch: 2.6.0+cu124
590
+ - Accelerate: 1.6.0
591
+ - Datasets: 3.6.0
592
+ - Tokenizers: 0.21.1
593
+
594
+ ## Citation
595
+
596
+ ### BibTeX
597
+
598
+ #### Sentence Transformers
599
+ ```bibtex
600
+ @inproceedings{reimers-2019-sentence-bert,
601
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
602
+ author = "Reimers, Nils and Gurevych, Iryna",
603
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
604
+ month = "11",
605
+ year = "2019",
606
+ publisher = "Association for Computational Linguistics",
607
+ url = "https://arxiv.org/abs/1908.10084",
608
+ }
609
+ ```
610
+
611
+ #### MatryoshkaLoss
612
+ ```bibtex
613
+ @misc{kusupati2024matryoshka,
614
+ title={Matryoshka Representation Learning},
615
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
616
+ year={2024},
617
+ eprint={2205.13147},
618
+ archivePrefix={arXiv},
619
+ primaryClass={cs.LG}
620
+ }
621
+ ```
622
+
623
+ #### MultipleNegativesRankingLoss
624
+ ```bibtex
625
+ @misc{henderson2017efficient,
626
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
627
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
628
+ year={2017},
629
+ eprint={1705.00652},
630
+ archivePrefix={arXiv},
631
+ primaryClass={cs.CL}
632
+ }
633
+ ```
634
+
635
+ <!--
636
+ ## Glossary
637
+
638
+ *Clearly define terms in order to be accessible across audiences.*
639
+ -->
640
+
641
+ <!--
642
+ ## Model Card Authors
643
+
644
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
645
+ -->
646
+
647
+ <!--
648
+ ## Model Card Contact
649
+
650
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
651
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 1024,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 4096,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 16,
16
+ "num_hidden_layers": 24,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.51.3",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9da3dfb70385b718cca7460e7f19f5946dfd3714234bd25c440002518ea33d4d
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff