Gurveer05 commited on
Commit
c70df6e
·
verified ·
1 Parent(s): 50bb180

MAP@25: 0.30918104653393014

Browse files
Files changed (4) hide show
  1. README.md +118 -95
  2. config.json +1 -1
  3. config_sentence_transformers.json +2 -2
  4. model.safetensors +1 -1
README.md CHANGED
@@ -8,49 +8,42 @@ tags:
8
  - feature-extraction
9
  - generated_from_trainer
10
  - dataset_size:2442
11
- - loss:MultipleNegativesSymmetricRankingLoss
12
  widget:
13
- - source_sentence: Carry out a subtraction problem with positive integers where the
14
- answer is less than 0 598-1000= This problem cannot be solved
15
  sentences:
16
- - Rounds to the wrong degree of accuracy (rounds too much)
17
- - When subtracting fractions, subtracts the numerators and denominators
18
- - Believes it is impossible to subtract a bigger number from a smaller number
19
- - source_sentence: Given the sketch of a curve in the form (x + a)(x + b), work out
20
- its factorised form Which of the following could be the equation of this curve?
21
- ![A graph of a quadratic curve that crosses the x axis at (1,0) and (3,0) and
22
- crosses the y axis at (0,3).]() y=(x+1)(x+3)
23
  sentences:
24
- - Does not use the associative property of multiplication to find other factors
25
- of a number
26
- - Believes they only need to multiply the first and last pairs of terms when expanding
27
- double brackets
28
- - Forgets to swap the sign of roots when placing into brackets
29
- - source_sentence: For a given output find the input of a function machine ![Image
30
- of a function machine. The function is add one third, and the output is 7]() What
31
- is the input of this function machine? 7 1/3
32
  sentences:
33
- - When finding an input of a function machine thinks you apply the operations given
34
- rather than the inverse operation.
35
- - Believes the solution to mx + c = a is the y intercept of y = mx +c
36
- - Squares when asked to find the square root
37
- - source_sentence: Count a number of objects 1,3,5,7, ? Which pattern matches the
38
- sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot.
39
- The second pattern is green dots arranged in a 2 by 2 square shape. The third
40
- pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is
41
- green dots arranged in a 4 by 4 square shape. ]()
42
  sentences:
43
- - 'Subtracts instead of adds when answering worded problems '
44
- - When multiplying a decimal less than 1 by an integer, gives an answer 10 times
45
- smaller than it should be
46
- - When given a linear sequence, cannot match it to a visual pattern
47
- - source_sentence: Express one quantity as a fraction of another A group of 8 friends
48
- share £6 equally. What fraction of the money do they each get? 1/8
 
 
49
  sentences:
50
- - Thinks the fraction 1/n can express sharing any number of items between n people
51
- - 'Does not understand that in the ratio 1:n the total number of parts would be
52
- 1+n '
53
- - Does not recognise the distributive property
 
54
  ---
55
 
56
  # SentenceTransformer based on BAAI/bge-large-en-v1.5
@@ -104,9 +97,9 @@ from sentence_transformers import SentenceTransformer
104
  model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
105
  # Run inference
106
  sentences = [
107
- 'Express one quantity as a fraction of another A group of 8 friends share £6 equally. What fraction of the money do they each get? 1/8',
108
- 'Thinks the fraction 1/n can express sharing any number of items between n people',
109
- 'Does not recognise the distributive property',
110
  ]
111
  embeddings = model.encode(sentences)
112
  print(embeddings.shape)
@@ -162,19 +155,45 @@ You can finetune this model on your own dataset.
162
 
163
  * Dataset: csv
164
  * Size: 2,442 training samples
165
- * Columns: <code>sentence1</code> and <code>sentence2</code>
166
  * Approximate statistics based on the first 1000 samples:
167
- | | sentence1 | sentence2 |
168
- |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
169
- | type | string | string |
170
- | details | <ul><li>min: 13 tokens</li><li>mean: 56.55 tokens</li><li>max: 306 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.13 tokens</li><li>max: 40 tokens</li></ul> |
171
  * Samples:
172
- | sentence1 | sentence2 |
173
- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------|
174
- | <code>Calculate the distance travelled using a speed-time graph Here is a speed-time graph for a car. Which of the following gives the best estimate for the distance travelled between 8 and 10 seconds? ![A graph showing time in seconds on the x axis and speed in metres per second on the y axis. The curve passes through the points (8,15) and (10,24)]() 48 m</code> | <code>Believes that when finding area under graph you can use the upper y value rather than average of upper and lower</code> |
175
- | <code>Add proper fractions with the same denominator Work out: 4/11+7/11 Write your answer in its simplest form. 11/11</code> | <code>Forgot to simplify the fraction</code> |
176
- | <code>Count a number of objects 1,3,5,7, … ? Which pattern matches the sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot. The second pattern is green dots arranged in a 2 by 2 square shape. The third pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is green dots arranged in a 4 by 4 square shape. ]()</code> | <code>When given a linear sequence, cannot match it to a visual pattern</code> |
177
- * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  ```json
179
  {
180
  "scale": 20.0,
@@ -186,12 +205,16 @@ You can finetune this model on your own dataset.
186
  #### Non-Default Hyperparameters
187
 
188
  - `eval_strategy`: steps
189
- - `per_device_train_batch_size`: 16
190
- - `per_device_eval_batch_size`: 16
 
 
191
  - `num_train_epochs`: 20
 
192
  - `warmup_ratio`: 0.1
193
  - `fp16`: True
194
  - `load_best_model_at_end`: True
 
195
  - `batch_sampler`: no_duplicates
196
 
197
  #### All Hyperparameters
@@ -201,22 +224,22 @@ You can finetune this model on your own dataset.
201
  - `do_predict`: False
202
  - `eval_strategy`: steps
203
  - `prediction_loss_only`: True
204
- - `per_device_train_batch_size`: 16
205
- - `per_device_eval_batch_size`: 16
206
  - `per_gpu_train_batch_size`: None
207
  - `per_gpu_eval_batch_size`: None
208
- - `gradient_accumulation_steps`: 1
209
  - `eval_accumulation_steps`: None
210
  - `torch_empty_cache_steps`: None
211
  - `learning_rate`: 5e-05
212
- - `weight_decay`: 0.0
213
  - `adam_beta1`: 0.9
214
  - `adam_beta2`: 0.999
215
  - `adam_epsilon`: 1e-08
216
  - `max_grad_norm`: 1.0
217
  - `num_train_epochs`: 20
218
  - `max_steps`: -1
219
- - `lr_scheduler_type`: linear
220
  - `lr_scheduler_kwargs`: {}
221
  - `warmup_ratio`: 0.1
222
  - `warmup_steps`: 0
@@ -281,7 +304,7 @@ You can finetune this model on your own dataset.
281
  - `hub_strategy`: every_save
282
  - `hub_private_repo`: False
283
  - `hub_always_push`: False
284
- - `gradient_checkpointing`: False
285
  - `gradient_checkpointing_kwargs`: None
286
  - `include_inputs_for_metrics`: False
287
  - `eval_do_concat_batches`: True
@@ -312,47 +335,35 @@ You can finetune this model on your own dataset.
312
  </details>
313
 
314
  ### Training Logs
315
- | Epoch | Step | Training Loss |
316
- |:---------:|:-------:|:-------------:|
317
- | 0.3766 | 29 | 1.4411 |
318
- | 0.7532 | 58 | 1.0084 |
319
- | 1.1299 | 87 | 0.7363 |
320
- | 1.5065 | 116 | 0.5658 |
321
- | 1.8831 | 145 | 0.4697 |
322
- | 2.2597 | 174 | 0.307 |
323
- | 2.6364 | 203 | 0.2828 |
324
- | 3.0130 | 232 | 0.1616 |
325
- | 3.3896 | 261 | 0.1542 |
326
- | 3.7662 | 290 | 0.1315 |
327
- | 4.1429 | 319 | 0.0984 |
328
- | 4.5195 | 348 | 0.1066 |
329
- | 4.8961 | 377 | 0.0768 |
330
- | 5.2727 | 406 | 0.0641 |
331
- | 5.6494 | 435 | 0.0558 |
332
- | 6.0260 | 464 | 0.0495 |
333
- | 6.4026 | 493 | 0.0459 |
334
- | 6.7792 | 522 | 0.0397 |
335
- | 7.1558 | 551 | 0.0255 |
336
- | 7.5325 | 580 | 0.0278 |
337
- | 7.9091 | 609 | 0.0237 |
338
- | 8.2857 | 638 | 0.0238 |
339
- | 8.6623 | 667 | 0.0248 |
340
- | **9.039** | **696** | **0.0158** |
341
- | 9.4156 | 725 | 0.0176 |
342
- | 9.7922 | 754 | 0.017 |
343
- | 10.1688 | 783 | 0.0116 |
344
- | 10.5455 | 812 | 0.0192 |
345
- | 10.9221 | 841 | 0.0076 |
346
- | 11.2987 | 870 | 0.009 |
347
 
348
  * The bold row denotes the saved checkpoint.
349
 
350
  ### Framework Versions
351
- - Python: 3.10.14
352
  - Sentence Transformers: 3.1.1
353
- - Transformers: 4.44.0
354
- - PyTorch: 2.4.0
355
- - Accelerate: 0.33.0
356
  - Datasets: 2.19.2
357
  - Tokenizers: 0.19.1
358
 
@@ -373,6 +384,18 @@ You can finetune this model on your own dataset.
373
  }
374
  ```
375
 
 
 
 
 
 
 
 
 
 
 
 
 
376
  <!--
377
  ## Glossary
378
 
 
8
  - feature-extraction
9
  - generated_from_trainer
10
  - dataset_size:2442
11
+ - loss:MultipleNegativesRankingLoss
12
  widget:
13
+ - source_sentence: ' Confusing height with width or depth for calculating the base
14
+ area.'
15
  sentences:
16
+ - Does not understand the first significant value is the first non-zero digit number
17
+ - Does not realise that subtracting a larger number will give a smaller answer
18
+ - Cannot identify the correct side lengths to use when asked to find the area of
19
+ a face
20
+ - source_sentence: ' Confusing exponentiation with multiplication.'
 
 
21
  sentences:
22
+ - Estimated when not appropriate
23
+ - Mixes up squaring and multiplying by 2 or doubling
24
+ - Writes the index as a digit on the end of a number
25
+ - source_sentence: ' Not recognizing the pattern of subtracting 4 from each term.'
 
 
 
 
26
  sentences:
27
+ - Identifies the term-to-term rule rather than the next term in a sequence
28
+ - Finds the median instead of the mode
29
+ - Rounds incorrectly by changing multiple place values
30
+ - source_sentence: ' Believing that remainders are needed to divide a group into equal
31
+ parts, rather than factors.'
 
 
 
 
32
  sentences:
33
+ - Does not follow the arrows through a function machine, changes the order of the
34
+ operations asked.
35
+ - When factorising into double brackets, believes the product of the constants in
36
+ the brackets is of the opposite sign to the constant in the expanded equation.
37
+ - Does not understand that factors are divisors which split the number into equal
38
+ groups
39
+ - source_sentence: ' Believing that the perimeter is divided equally among all sides
40
+ without considering the number of sides in the shape.'
41
  sentences:
42
+ - When given the perimeter of a regular polygon, multiplies instead of divides to
43
+ find each side length
44
+ - Does not understand the value of zeros as placeholders
45
+ - When asked to solve simultaneous equations, believes they can just find values
46
+ that work in one equation
47
  ---
48
 
49
  # SentenceTransformer based on BAAI/bge-large-en-v1.5
 
97
  model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
98
  # Run inference
99
  sentences = [
100
+ ' Believing that the perimeter is divided equally among all sides without considering the number of sides in the shape.',
101
+ 'When given the perimeter of a regular polygon, multiplies instead of divides to find each side length',
102
+ 'When asked to solve simultaneous equations, believes they can just find values that work in one equation',
103
  ]
104
  embeddings = model.encode(sentences)
105
  print(embeddings.shape)
 
155
 
156
  * Dataset: csv
157
  * Size: 2,442 training samples
158
+ * Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
159
  * Approximate statistics based on the first 1000 samples:
160
+ | | PredictedMisconception | MisconceptionName |
161
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
162
+ | type | string | string |
163
+ | details | <ul><li>min: 8 tokens</li><li>mean: 17.15 tokens</li><li>max: 72 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.15 tokens</li><li>max: 40 tokens</li></ul> |
164
  * Samples:
165
+ | PredictedMisconception | MisconceptionName |
166
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------|
167
+ | <code> Believing equilateral triangles have varying side lengths.</code> | <code>Does not know the meaning of equilateral</code> |
168
+ | <code> Believing that the side length of a square is the square root of the area, but incorrectly calculating it as the square of the area.</code> | <code>Confuses perimeter and area</code> |
169
+ | <code> The longest edge length is necessary for volume calculation in a triangular prism.</code> | <code>Finds area of one face when asked for volume</code> |
170
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
171
+ ```json
172
+ {
173
+ "scale": 20.0,
174
+ "similarity_fct": "cos_sim"
175
+ }
176
+ ```
177
+
178
+ ### Evaluation Dataset
179
+
180
+ #### csv
181
+
182
+ * Dataset: csv
183
+ * Size: 1,928 evaluation samples
184
+ * Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
185
+ * Approximate statistics based on the first 1000 samples:
186
+ | | PredictedMisconception | MisconceptionName |
187
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
188
+ | type | string | string |
189
+ | details | <ul><li>min: 8 tokens</li><li>mean: 16.66 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.32 tokens</li><li>max: 40 tokens</li></ul> |
190
+ * Samples:
191
+ | PredictedMisconception | MisconceptionName |
192
+ |:----------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|
193
+ | <code> Believing the sequence's common difference is positive, leading to an incorrect nth-term formula.</code> | <code>When finding the nth term of a linear sequence, thinks the the first term is the coefficient in front of n.</code> |
194
+ | <code> Incorrect application of the nth term formula for integer sequences.</code> | <code>When solving an equation, uses the same operation rather than the inverse.</code> |
195
+ | <code> Belief that shapes with more sides have higher rotational symmetry.</code> | <code>Does not know how to find order of rotational symmetry</code> |
196
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
197
  ```json
198
  {
199
  "scale": 20.0,
 
205
  #### Non-Default Hyperparameters
206
 
207
  - `eval_strategy`: steps
208
+ - `per_device_train_batch_size`: 64
209
+ - `per_device_eval_batch_size`: 64
210
+ - `gradient_accumulation_steps`: 8
211
+ - `weight_decay`: 0.01
212
  - `num_train_epochs`: 20
213
+ - `lr_scheduler_type`: cosine_with_restarts
214
  - `warmup_ratio`: 0.1
215
  - `fp16`: True
216
  - `load_best_model_at_end`: True
217
+ - `gradient_checkpointing`: True
218
  - `batch_sampler`: no_duplicates
219
 
220
  #### All Hyperparameters
 
224
  - `do_predict`: False
225
  - `eval_strategy`: steps
226
  - `prediction_loss_only`: True
227
+ - `per_device_train_batch_size`: 64
228
+ - `per_device_eval_batch_size`: 64
229
  - `per_gpu_train_batch_size`: None
230
  - `per_gpu_eval_batch_size`: None
231
+ - `gradient_accumulation_steps`: 8
232
  - `eval_accumulation_steps`: None
233
  - `torch_empty_cache_steps`: None
234
  - `learning_rate`: 5e-05
235
+ - `weight_decay`: 0.01
236
  - `adam_beta1`: 0.9
237
  - `adam_beta2`: 0.999
238
  - `adam_epsilon`: 1e-08
239
  - `max_grad_norm`: 1.0
240
  - `num_train_epochs`: 20
241
  - `max_steps`: -1
242
+ - `lr_scheduler_type`: cosine_with_restarts
243
  - `lr_scheduler_kwargs`: {}
244
  - `warmup_ratio`: 0.1
245
  - `warmup_steps`: 0
 
304
  - `hub_strategy`: every_save
305
  - `hub_private_repo`: False
306
  - `hub_always_push`: False
307
+ - `gradient_checkpointing`: True
308
  - `gradient_checkpointing_kwargs`: None
309
  - `include_inputs_for_metrics`: False
310
  - `eval_do_concat_batches`: True
 
335
  </details>
336
 
337
  ### Training Logs
338
+ | Epoch | Step | Training Loss | loss |
339
+ |:----------:|:------:|:-------------:|:----------:|
340
+ | 0.4103 | 2 | 2.5492 | - |
341
+ | 0.6154 | 3 | - | 1.4112 |
342
+ | 0.8205 | 4 | 2.319 | - |
343
+ | 1.2308 | 6 | 1.7499 | 1.2462 |
344
+ | 1.6410 | 8 | 1.7464 | - |
345
+ | 1.8462 | 9 | - | 1.1584 |
346
+ | 2.0513 | 10 | 1.4739 | - |
347
+ | 2.4615 | 12 | 1.3037 | 1.0487 |
348
+ | 2.8718 | 14 | 1.2155 | - |
349
+ | 3.0769 | 15 | - | 1.0078 |
350
+ | 3.2821 | 16 | 0.9292 | - |
351
+ | 3.6923 | 18 | 0.8923 | 0.9539 |
352
+ | 4.1026 | 20 | 0.7312 | - |
353
+ | **4.3077** | **21** | **-** | **0.9079** |
354
+ | 4.5128 | 22 | 0.6182 | - |
355
+ | 4.9231 | 24 | 0.5942 | 0.9088 |
356
+ | 5.3333 | 26 | 0.4158 | - |
357
+ | 5.5385 | 27 | - | 0.9095 |
 
 
 
 
 
 
 
 
 
 
 
 
358
 
359
  * The bold row denotes the saved checkpoint.
360
 
361
  ### Framework Versions
362
+ - Python: 3.10.12
363
  - Sentence Transformers: 3.1.1
364
+ - Transformers: 4.44.2
365
+ - PyTorch: 2.4.1+cu121
366
+ - Accelerate: 0.34.2
367
  - Datasets: 2.19.2
368
  - Tokenizers: 0.19.1
369
 
 
384
  }
385
  ```
386
 
387
+ #### MultipleNegativesRankingLoss
388
+ ```bibtex
389
+ @misc{henderson2017efficient,
390
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
391
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
392
+ year={2017},
393
+ eprint={1705.00652},
394
+ archivePrefix={arXiv},
395
+ primaryClass={cs.CL}
396
+ }
397
+ ```
398
+
399
  <!--
400
  ## Glossary
401
 
config.json CHANGED
@@ -25,7 +25,7 @@
25
  "pad_token_id": 0,
26
  "position_embedding_type": "absolute",
27
  "torch_dtype": "float32",
28
- "transformers_version": "4.44.0",
29
  "type_vocab_size": 2,
30
  "use_cache": true,
31
  "vocab_size": 30522
 
25
  "pad_token_id": 0,
26
  "position_embedding_type": "absolute",
27
  "torch_dtype": "float32",
28
+ "transformers_version": "4.44.2",
29
  "type_vocab_size": 2,
30
  "use_cache": true,
31
  "vocab_size": 30522
config_sentence_transformers.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "__version__": {
3
  "sentence_transformers": "3.1.1",
4
- "transformers": "4.44.0",
5
- "pytorch": "2.4.0"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null,
 
1
  {
2
  "__version__": {
3
  "sentence_transformers": "3.1.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0a05fe01c79e9d58438063e8a0f24a4341a0671378aaa11eee7fa7a304ce60e5
3
  size 1340612432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43cd7df34b025417b63ee647fd91159e5fe741ebcc584907ed2b6533605a8703
3
  size 1340612432