MAP@25: 0.30918104653393014
Browse files- README.md +118 -95
- config.json +1 -1
- config_sentence_transformers.json +2 -2
- model.safetensors +1 -1
README.md
CHANGED
@@ -8,49 +8,42 @@ tags:
|
|
8 |
- feature-extraction
|
9 |
- generated_from_trainer
|
10 |
- dataset_size:2442
|
11 |
-
- loss:
|
12 |
widget:
|
13 |
-
- source_sentence:
|
14 |
-
|
15 |
sentences:
|
16 |
-
-
|
17 |
-
-
|
18 |
-
-
|
19 |
-
|
20 |
-
|
21 |
-
![A graph of a quadratic curve that crosses the x axis at (1,0) and (3,0) and
|
22 |
-
crosses the y axis at (0,3).]() y=(x+1)(x+3)
|
23 |
sentences:
|
24 |
-
-
|
25 |
-
|
26 |
-
-
|
27 |
-
|
28 |
-
- Forgets to swap the sign of roots when placing into brackets
|
29 |
-
- source_sentence: For a given output find the input of a function machine ![Image
|
30 |
-
of a function machine. The function is add one third, and the output is 7]() What
|
31 |
-
is the input of this function machine? 7 1/3
|
32 |
sentences:
|
33 |
-
-
|
34 |
-
|
35 |
-
-
|
36 |
-
|
37 |
-
|
38 |
-
sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot.
|
39 |
-
The second pattern is green dots arranged in a 2 by 2 square shape. The third
|
40 |
-
pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is
|
41 |
-
green dots arranged in a 4 by 4 square shape. ]()
|
42 |
sentences:
|
43 |
-
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
-
|
48 |
-
|
|
|
|
|
49 |
sentences:
|
50 |
-
-
|
51 |
-
|
52 |
-
|
53 |
-
-
|
|
|
54 |
---
|
55 |
|
56 |
# SentenceTransformer based on BAAI/bge-large-en-v1.5
|
@@ -104,9 +97,9 @@ from sentence_transformers import SentenceTransformer
|
|
104 |
model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
|
105 |
# Run inference
|
106 |
sentences = [
|
107 |
-
'
|
108 |
-
'
|
109 |
-
'
|
110 |
]
|
111 |
embeddings = model.encode(sentences)
|
112 |
print(embeddings.shape)
|
@@ -162,19 +155,45 @@ You can finetune this model on your own dataset.
|
|
162 |
|
163 |
* Dataset: csv
|
164 |
* Size: 2,442 training samples
|
165 |
-
* Columns: <code>
|
166 |
* Approximate statistics based on the first 1000 samples:
|
167 |
-
| |
|
168 |
-
|
169 |
-
| type | string
|
170 |
-
| details | <ul><li>min:
|
171 |
* Samples:
|
172 |
-
|
|
173 |
-
|
174 |
-
| <code>
|
175 |
-
| <code>
|
176 |
-
| <code>
|
177 |
-
* Loss: [<code>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
```json
|
179 |
{
|
180 |
"scale": 20.0,
|
@@ -186,12 +205,16 @@ You can finetune this model on your own dataset.
|
|
186 |
#### Non-Default Hyperparameters
|
187 |
|
188 |
- `eval_strategy`: steps
|
189 |
-
- `per_device_train_batch_size`:
|
190 |
-
- `per_device_eval_batch_size`:
|
|
|
|
|
191 |
- `num_train_epochs`: 20
|
|
|
192 |
- `warmup_ratio`: 0.1
|
193 |
- `fp16`: True
|
194 |
- `load_best_model_at_end`: True
|
|
|
195 |
- `batch_sampler`: no_duplicates
|
196 |
|
197 |
#### All Hyperparameters
|
@@ -201,22 +224,22 @@ You can finetune this model on your own dataset.
|
|
201 |
- `do_predict`: False
|
202 |
- `eval_strategy`: steps
|
203 |
- `prediction_loss_only`: True
|
204 |
-
- `per_device_train_batch_size`:
|
205 |
-
- `per_device_eval_batch_size`:
|
206 |
- `per_gpu_train_batch_size`: None
|
207 |
- `per_gpu_eval_batch_size`: None
|
208 |
-
- `gradient_accumulation_steps`:
|
209 |
- `eval_accumulation_steps`: None
|
210 |
- `torch_empty_cache_steps`: None
|
211 |
- `learning_rate`: 5e-05
|
212 |
-
- `weight_decay`: 0.
|
213 |
- `adam_beta1`: 0.9
|
214 |
- `adam_beta2`: 0.999
|
215 |
- `adam_epsilon`: 1e-08
|
216 |
- `max_grad_norm`: 1.0
|
217 |
- `num_train_epochs`: 20
|
218 |
- `max_steps`: -1
|
219 |
-
- `lr_scheduler_type`:
|
220 |
- `lr_scheduler_kwargs`: {}
|
221 |
- `warmup_ratio`: 0.1
|
222 |
- `warmup_steps`: 0
|
@@ -281,7 +304,7 @@ You can finetune this model on your own dataset.
|
|
281 |
- `hub_strategy`: every_save
|
282 |
- `hub_private_repo`: False
|
283 |
- `hub_always_push`: False
|
284 |
-
- `gradient_checkpointing`:
|
285 |
- `gradient_checkpointing_kwargs`: None
|
286 |
- `include_inputs_for_metrics`: False
|
287 |
- `eval_do_concat_batches`: True
|
@@ -312,47 +335,35 @@ You can finetune this model on your own dataset.
|
|
312 |
</details>
|
313 |
|
314 |
### Training Logs
|
315 |
-
| Epoch
|
316 |
-
|
317 |
-
| 0.
|
318 |
-
| 0.
|
319 |
-
|
|
320 |
-
| 1.
|
321 |
-
| 1.
|
322 |
-
|
|
323 |
-
| 2.
|
324 |
-
|
|
325 |
-
|
|
326 |
-
| 3.
|
327 |
-
|
|
328 |
-
|
|
329 |
-
| 4.
|
330 |
-
|
|
331 |
-
|
|
332 |
-
|
|
333 |
-
|
|
334 |
-
|
|
335 |
-
| 7.1558 | 551 | 0.0255 |
|
336 |
-
| 7.5325 | 580 | 0.0278 |
|
337 |
-
| 7.9091 | 609 | 0.0237 |
|
338 |
-
| 8.2857 | 638 | 0.0238 |
|
339 |
-
| 8.6623 | 667 | 0.0248 |
|
340 |
-
| **9.039** | **696** | **0.0158** |
|
341 |
-
| 9.4156 | 725 | 0.0176 |
|
342 |
-
| 9.7922 | 754 | 0.017 |
|
343 |
-
| 10.1688 | 783 | 0.0116 |
|
344 |
-
| 10.5455 | 812 | 0.0192 |
|
345 |
-
| 10.9221 | 841 | 0.0076 |
|
346 |
-
| 11.2987 | 870 | 0.009 |
|
347 |
|
348 |
* The bold row denotes the saved checkpoint.
|
349 |
|
350 |
### Framework Versions
|
351 |
-
- Python: 3.10.
|
352 |
- Sentence Transformers: 3.1.1
|
353 |
-
- Transformers: 4.44.
|
354 |
-
- PyTorch: 2.4.
|
355 |
-
- Accelerate: 0.
|
356 |
- Datasets: 2.19.2
|
357 |
- Tokenizers: 0.19.1
|
358 |
|
@@ -373,6 +384,18 @@ You can finetune this model on your own dataset.
|
|
373 |
}
|
374 |
```
|
375 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
376 |
<!--
|
377 |
## Glossary
|
378 |
|
|
|
8 |
- feature-extraction
|
9 |
- generated_from_trainer
|
10 |
- dataset_size:2442
|
11 |
+
- loss:MultipleNegativesRankingLoss
|
12 |
widget:
|
13 |
+
- source_sentence: ' Confusing height with width or depth for calculating the base
|
14 |
+
area.'
|
15 |
sentences:
|
16 |
+
- Does not understand the first significant value is the first non-zero digit number
|
17 |
+
- Does not realise that subtracting a larger number will give a smaller answer
|
18 |
+
- Cannot identify the correct side lengths to use when asked to find the area of
|
19 |
+
a face
|
20 |
+
- source_sentence: ' Confusing exponentiation with multiplication.'
|
|
|
|
|
21 |
sentences:
|
22 |
+
- Estimated when not appropriate
|
23 |
+
- Mixes up squaring and multiplying by 2 or doubling
|
24 |
+
- Writes the index as a digit on the end of a number
|
25 |
+
- source_sentence: ' Not recognizing the pattern of subtracting 4 from each term.'
|
|
|
|
|
|
|
|
|
26 |
sentences:
|
27 |
+
- Identifies the term-to-term rule rather than the next term in a sequence
|
28 |
+
- Finds the median instead of the mode
|
29 |
+
- Rounds incorrectly by changing multiple place values
|
30 |
+
- source_sentence: ' Believing that remainders are needed to divide a group into equal
|
31 |
+
parts, rather than factors.'
|
|
|
|
|
|
|
|
|
32 |
sentences:
|
33 |
+
- Does not follow the arrows through a function machine, changes the order of the
|
34 |
+
operations asked.
|
35 |
+
- When factorising into double brackets, believes the product of the constants in
|
36 |
+
the brackets is of the opposite sign to the constant in the expanded equation.
|
37 |
+
- Does not understand that factors are divisors which split the number into equal
|
38 |
+
groups
|
39 |
+
- source_sentence: ' Believing that the perimeter is divided equally among all sides
|
40 |
+
without considering the number of sides in the shape.'
|
41 |
sentences:
|
42 |
+
- When given the perimeter of a regular polygon, multiplies instead of divides to
|
43 |
+
find each side length
|
44 |
+
- Does not understand the value of zeros as placeholders
|
45 |
+
- When asked to solve simultaneous equations, believes they can just find values
|
46 |
+
that work in one equation
|
47 |
---
|
48 |
|
49 |
# SentenceTransformer based on BAAI/bge-large-en-v1.5
|
|
|
97 |
model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
|
98 |
# Run inference
|
99 |
sentences = [
|
100 |
+
' Believing that the perimeter is divided equally among all sides without considering the number of sides in the shape.',
|
101 |
+
'When given the perimeter of a regular polygon, multiplies instead of divides to find each side length',
|
102 |
+
'When asked to solve simultaneous equations, believes they can just find values that work in one equation',
|
103 |
]
|
104 |
embeddings = model.encode(sentences)
|
105 |
print(embeddings.shape)
|
|
|
155 |
|
156 |
* Dataset: csv
|
157 |
* Size: 2,442 training samples
|
158 |
+
* Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
|
159 |
* Approximate statistics based on the first 1000 samples:
|
160 |
+
| | PredictedMisconception | MisconceptionName |
|
161 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
162 |
+
| type | string | string |
|
163 |
+
| details | <ul><li>min: 8 tokens</li><li>mean: 17.15 tokens</li><li>max: 72 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.15 tokens</li><li>max: 40 tokens</li></ul> |
|
164 |
* Samples:
|
165 |
+
| PredictedMisconception | MisconceptionName |
|
166 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------|
|
167 |
+
| <code> Believing equilateral triangles have varying side lengths.</code> | <code>Does not know the meaning of equilateral</code> |
|
168 |
+
| <code> Believing that the side length of a square is the square root of the area, but incorrectly calculating it as the square of the area.</code> | <code>Confuses perimeter and area</code> |
|
169 |
+
| <code> The longest edge length is necessary for volume calculation in a triangular prism.</code> | <code>Finds area of one face when asked for volume</code> |
|
170 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
171 |
+
```json
|
172 |
+
{
|
173 |
+
"scale": 20.0,
|
174 |
+
"similarity_fct": "cos_sim"
|
175 |
+
}
|
176 |
+
```
|
177 |
+
|
178 |
+
### Evaluation Dataset
|
179 |
+
|
180 |
+
#### csv
|
181 |
+
|
182 |
+
* Dataset: csv
|
183 |
+
* Size: 1,928 evaluation samples
|
184 |
+
* Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
|
185 |
+
* Approximate statistics based on the first 1000 samples:
|
186 |
+
| | PredictedMisconception | MisconceptionName |
|
187 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
188 |
+
| type | string | string |
|
189 |
+
| details | <ul><li>min: 8 tokens</li><li>mean: 16.66 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.32 tokens</li><li>max: 40 tokens</li></ul> |
|
190 |
+
* Samples:
|
191 |
+
| PredictedMisconception | MisconceptionName |
|
192 |
+
|:----------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|
|
193 |
+
| <code> Believing the sequence's common difference is positive, leading to an incorrect nth-term formula.</code> | <code>When finding the nth term of a linear sequence, thinks the the first term is the coefficient in front of n.</code> |
|
194 |
+
| <code> Incorrect application of the nth term formula for integer sequences.</code> | <code>When solving an equation, uses the same operation rather than the inverse.</code> |
|
195 |
+
| <code> Belief that shapes with more sides have higher rotational symmetry.</code> | <code>Does not know how to find order of rotational symmetry</code> |
|
196 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
197 |
```json
|
198 |
{
|
199 |
"scale": 20.0,
|
|
|
205 |
#### Non-Default Hyperparameters
|
206 |
|
207 |
- `eval_strategy`: steps
|
208 |
+
- `per_device_train_batch_size`: 64
|
209 |
+
- `per_device_eval_batch_size`: 64
|
210 |
+
- `gradient_accumulation_steps`: 8
|
211 |
+
- `weight_decay`: 0.01
|
212 |
- `num_train_epochs`: 20
|
213 |
+
- `lr_scheduler_type`: cosine_with_restarts
|
214 |
- `warmup_ratio`: 0.1
|
215 |
- `fp16`: True
|
216 |
- `load_best_model_at_end`: True
|
217 |
+
- `gradient_checkpointing`: True
|
218 |
- `batch_sampler`: no_duplicates
|
219 |
|
220 |
#### All Hyperparameters
|
|
|
224 |
- `do_predict`: False
|
225 |
- `eval_strategy`: steps
|
226 |
- `prediction_loss_only`: True
|
227 |
+
- `per_device_train_batch_size`: 64
|
228 |
+
- `per_device_eval_batch_size`: 64
|
229 |
- `per_gpu_train_batch_size`: None
|
230 |
- `per_gpu_eval_batch_size`: None
|
231 |
+
- `gradient_accumulation_steps`: 8
|
232 |
- `eval_accumulation_steps`: None
|
233 |
- `torch_empty_cache_steps`: None
|
234 |
- `learning_rate`: 5e-05
|
235 |
+
- `weight_decay`: 0.01
|
236 |
- `adam_beta1`: 0.9
|
237 |
- `adam_beta2`: 0.999
|
238 |
- `adam_epsilon`: 1e-08
|
239 |
- `max_grad_norm`: 1.0
|
240 |
- `num_train_epochs`: 20
|
241 |
- `max_steps`: -1
|
242 |
+
- `lr_scheduler_type`: cosine_with_restarts
|
243 |
- `lr_scheduler_kwargs`: {}
|
244 |
- `warmup_ratio`: 0.1
|
245 |
- `warmup_steps`: 0
|
|
|
304 |
- `hub_strategy`: every_save
|
305 |
- `hub_private_repo`: False
|
306 |
- `hub_always_push`: False
|
307 |
+
- `gradient_checkpointing`: True
|
308 |
- `gradient_checkpointing_kwargs`: None
|
309 |
- `include_inputs_for_metrics`: False
|
310 |
- `eval_do_concat_batches`: True
|
|
|
335 |
</details>
|
336 |
|
337 |
### Training Logs
|
338 |
+
| Epoch | Step | Training Loss | loss |
|
339 |
+
|:----------:|:------:|:-------------:|:----------:|
|
340 |
+
| 0.4103 | 2 | 2.5492 | - |
|
341 |
+
| 0.6154 | 3 | - | 1.4112 |
|
342 |
+
| 0.8205 | 4 | 2.319 | - |
|
343 |
+
| 1.2308 | 6 | 1.7499 | 1.2462 |
|
344 |
+
| 1.6410 | 8 | 1.7464 | - |
|
345 |
+
| 1.8462 | 9 | - | 1.1584 |
|
346 |
+
| 2.0513 | 10 | 1.4739 | - |
|
347 |
+
| 2.4615 | 12 | 1.3037 | 1.0487 |
|
348 |
+
| 2.8718 | 14 | 1.2155 | - |
|
349 |
+
| 3.0769 | 15 | - | 1.0078 |
|
350 |
+
| 3.2821 | 16 | 0.9292 | - |
|
351 |
+
| 3.6923 | 18 | 0.8923 | 0.9539 |
|
352 |
+
| 4.1026 | 20 | 0.7312 | - |
|
353 |
+
| **4.3077** | **21** | **-** | **0.9079** |
|
354 |
+
| 4.5128 | 22 | 0.6182 | - |
|
355 |
+
| 4.9231 | 24 | 0.5942 | 0.9088 |
|
356 |
+
| 5.3333 | 26 | 0.4158 | - |
|
357 |
+
| 5.5385 | 27 | - | 0.9095 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
358 |
|
359 |
* The bold row denotes the saved checkpoint.
|
360 |
|
361 |
### Framework Versions
|
362 |
+
- Python: 3.10.12
|
363 |
- Sentence Transformers: 3.1.1
|
364 |
+
- Transformers: 4.44.2
|
365 |
+
- PyTorch: 2.4.1+cu121
|
366 |
+
- Accelerate: 0.34.2
|
367 |
- Datasets: 2.19.2
|
368 |
- Tokenizers: 0.19.1
|
369 |
|
|
|
384 |
}
|
385 |
```
|
386 |
|
387 |
+
#### MultipleNegativesRankingLoss
|
388 |
+
```bibtex
|
389 |
+
@misc{henderson2017efficient,
|
390 |
+
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
391 |
+
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
392 |
+
year={2017},
|
393 |
+
eprint={1705.00652},
|
394 |
+
archivePrefix={arXiv},
|
395 |
+
primaryClass={cs.CL}
|
396 |
+
}
|
397 |
+
```
|
398 |
+
|
399 |
<!--
|
400 |
## Glossary
|
401 |
|
config.json
CHANGED
@@ -25,7 +25,7 @@
|
|
25 |
"pad_token_id": 0,
|
26 |
"position_embedding_type": "absolute",
|
27 |
"torch_dtype": "float32",
|
28 |
-
"transformers_version": "4.44.
|
29 |
"type_vocab_size": 2,
|
30 |
"use_cache": true,
|
31 |
"vocab_size": 30522
|
|
|
25 |
"pad_token_id": 0,
|
26 |
"position_embedding_type": "absolute",
|
27 |
"torch_dtype": "float32",
|
28 |
+
"transformers_version": "4.44.2",
|
29 |
"type_vocab_size": 2,
|
30 |
"use_cache": true,
|
31 |
"vocab_size": 30522
|
config_sentence_transformers.json
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
{
|
2 |
"__version__": {
|
3 |
"sentence_transformers": "3.1.1",
|
4 |
-
"transformers": "4.44.
|
5 |
-
"pytorch": "2.4.
|
6 |
},
|
7 |
"prompts": {},
|
8 |
"default_prompt_name": null,
|
|
|
1 |
{
|
2 |
"__version__": {
|
3 |
"sentence_transformers": "3.1.1",
|
4 |
+
"transformers": "4.44.2",
|
5 |
+
"pytorch": "2.4.1+cu121"
|
6 |
},
|
7 |
"prompts": {},
|
8 |
"default_prompt_name": null,
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1340612432
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:43cd7df34b025417b63ee647fd91159e5fe741ebcc584907ed2b6533605a8703
|
3 |
size 1340612432
|