MAP@25: 0.30918104653393014
Browse files- README.md +118 -95
- config.json +1 -1
- config_sentence_transformers.json +2 -2
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -8,49 +8,42 @@ tags:
|
|
| 8 |
- feature-extraction
|
| 9 |
- generated_from_trainer
|
| 10 |
- dataset_size:2442
|
| 11 |
-
- loss:
|
| 12 |
widget:
|
| 13 |
-
- source_sentence:
|
| 14 |
-
|
| 15 |
sentences:
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
![A graph of a quadratic curve that crosses the x axis at (1,0) and (3,0) and
|
| 22 |
-
crosses the y axis at (0,3).]() y=(x+1)(x+3)
|
| 23 |
sentences:
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
-
|
| 27 |
-
|
| 28 |
-
- Forgets to swap the sign of roots when placing into brackets
|
| 29 |
-
- source_sentence: For a given output find the input of a function machine ![Image
|
| 30 |
-
of a function machine. The function is add one third, and the output is 7]() What
|
| 31 |
-
is the input of this function machine? 7 1/3
|
| 32 |
sentences:
|
| 33 |
-
-
|
| 34 |
-
|
| 35 |
-
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot.
|
| 39 |
-
The second pattern is green dots arranged in a 2 by 2 square shape. The third
|
| 40 |
-
pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is
|
| 41 |
-
green dots arranged in a 4 by 4 square shape. ]()
|
| 42 |
sentences:
|
| 43 |
-
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
-
|
| 48 |
-
|
|
|
|
|
|
|
| 49 |
sentences:
|
| 50 |
-
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
-
|
|
|
|
| 54 |
---
|
| 55 |
|
| 56 |
# SentenceTransformer based on BAAI/bge-large-en-v1.5
|
|
@@ -104,9 +97,9 @@ from sentence_transformers import SentenceTransformer
|
|
| 104 |
model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
|
| 105 |
# Run inference
|
| 106 |
sentences = [
|
| 107 |
-
'
|
| 108 |
-
'
|
| 109 |
-
'
|
| 110 |
]
|
| 111 |
embeddings = model.encode(sentences)
|
| 112 |
print(embeddings.shape)
|
|
@@ -162,19 +155,45 @@ You can finetune this model on your own dataset.
|
|
| 162 |
|
| 163 |
* Dataset: csv
|
| 164 |
* Size: 2,442 training samples
|
| 165 |
-
* Columns: <code>
|
| 166 |
* Approximate statistics based on the first 1000 samples:
|
| 167 |
-
| |
|
| 168 |
-
|
| 169 |
-
| type | string
|
| 170 |
-
| details | <ul><li>min:
|
| 171 |
* Samples:
|
| 172 |
-
|
|
| 173 |
-
|
| 174 |
-
| <code>
|
| 175 |
-
| <code>
|
| 176 |
-
| <code>
|
| 177 |
-
* Loss: [<code>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
```json
|
| 179 |
{
|
| 180 |
"scale": 20.0,
|
|
@@ -186,12 +205,16 @@ You can finetune this model on your own dataset.
|
|
| 186 |
#### Non-Default Hyperparameters
|
| 187 |
|
| 188 |
- `eval_strategy`: steps
|
| 189 |
-
- `per_device_train_batch_size`:
|
| 190 |
-
- `per_device_eval_batch_size`:
|
|
|
|
|
|
|
| 191 |
- `num_train_epochs`: 20
|
|
|
|
| 192 |
- `warmup_ratio`: 0.1
|
| 193 |
- `fp16`: True
|
| 194 |
- `load_best_model_at_end`: True
|
|
|
|
| 195 |
- `batch_sampler`: no_duplicates
|
| 196 |
|
| 197 |
#### All Hyperparameters
|
|
@@ -201,22 +224,22 @@ You can finetune this model on your own dataset.
|
|
| 201 |
- `do_predict`: False
|
| 202 |
- `eval_strategy`: steps
|
| 203 |
- `prediction_loss_only`: True
|
| 204 |
-
- `per_device_train_batch_size`:
|
| 205 |
-
- `per_device_eval_batch_size`:
|
| 206 |
- `per_gpu_train_batch_size`: None
|
| 207 |
- `per_gpu_eval_batch_size`: None
|
| 208 |
-
- `gradient_accumulation_steps`:
|
| 209 |
- `eval_accumulation_steps`: None
|
| 210 |
- `torch_empty_cache_steps`: None
|
| 211 |
- `learning_rate`: 5e-05
|
| 212 |
-
- `weight_decay`: 0.
|
| 213 |
- `adam_beta1`: 0.9
|
| 214 |
- `adam_beta2`: 0.999
|
| 215 |
- `adam_epsilon`: 1e-08
|
| 216 |
- `max_grad_norm`: 1.0
|
| 217 |
- `num_train_epochs`: 20
|
| 218 |
- `max_steps`: -1
|
| 219 |
-
- `lr_scheduler_type`:
|
| 220 |
- `lr_scheduler_kwargs`: {}
|
| 221 |
- `warmup_ratio`: 0.1
|
| 222 |
- `warmup_steps`: 0
|
|
@@ -281,7 +304,7 @@ You can finetune this model on your own dataset.
|
|
| 281 |
- `hub_strategy`: every_save
|
| 282 |
- `hub_private_repo`: False
|
| 283 |
- `hub_always_push`: False
|
| 284 |
-
- `gradient_checkpointing`:
|
| 285 |
- `gradient_checkpointing_kwargs`: None
|
| 286 |
- `include_inputs_for_metrics`: False
|
| 287 |
- `eval_do_concat_batches`: True
|
|
@@ -312,47 +335,35 @@ You can finetune this model on your own dataset.
|
|
| 312 |
</details>
|
| 313 |
|
| 314 |
### Training Logs
|
| 315 |
-
| Epoch
|
| 316 |
-
|
| 317 |
-
| 0.
|
| 318 |
-
| 0.
|
| 319 |
-
|
|
| 320 |
-
| 1.
|
| 321 |
-
| 1.
|
| 322 |
-
|
|
| 323 |
-
| 2.
|
| 324 |
-
|
|
| 325 |
-
|
|
| 326 |
-
| 3.
|
| 327 |
-
|
|
| 328 |
-
|
|
| 329 |
-
| 4.
|
| 330 |
-
|
|
| 331 |
-
|
|
| 332 |
-
|
|
| 333 |
-
|
|
| 334 |
-
|
|
| 335 |
-
| 7.1558 | 551 | 0.0255 |
|
| 336 |
-
| 7.5325 | 580 | 0.0278 |
|
| 337 |
-
| 7.9091 | 609 | 0.0237 |
|
| 338 |
-
| 8.2857 | 638 | 0.0238 |
|
| 339 |
-
| 8.6623 | 667 | 0.0248 |
|
| 340 |
-
| **9.039** | **696** | **0.0158** |
|
| 341 |
-
| 9.4156 | 725 | 0.0176 |
|
| 342 |
-
| 9.7922 | 754 | 0.017 |
|
| 343 |
-
| 10.1688 | 783 | 0.0116 |
|
| 344 |
-
| 10.5455 | 812 | 0.0192 |
|
| 345 |
-
| 10.9221 | 841 | 0.0076 |
|
| 346 |
-
| 11.2987 | 870 | 0.009 |
|
| 347 |
|
| 348 |
* The bold row denotes the saved checkpoint.
|
| 349 |
|
| 350 |
### Framework Versions
|
| 351 |
-
- Python: 3.10.
|
| 352 |
- Sentence Transformers: 3.1.1
|
| 353 |
-
- Transformers: 4.44.
|
| 354 |
-
- PyTorch: 2.4.
|
| 355 |
-
- Accelerate: 0.
|
| 356 |
- Datasets: 2.19.2
|
| 357 |
- Tokenizers: 0.19.1
|
| 358 |
|
|
@@ -373,6 +384,18 @@ You can finetune this model on your own dataset.
|
|
| 373 |
}
|
| 374 |
```
|
| 375 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 376 |
<!--
|
| 377 |
## Glossary
|
| 378 |
|
|
|
|
| 8 |
- feature-extraction
|
| 9 |
- generated_from_trainer
|
| 10 |
- dataset_size:2442
|
| 11 |
+
- loss:MultipleNegativesRankingLoss
|
| 12 |
widget:
|
| 13 |
+
- source_sentence: ' Confusing height with width or depth for calculating the base
|
| 14 |
+
area.'
|
| 15 |
sentences:
|
| 16 |
+
- Does not understand the first significant value is the first non-zero digit number
|
| 17 |
+
- Does not realise that subtracting a larger number will give a smaller answer
|
| 18 |
+
- Cannot identify the correct side lengths to use when asked to find the area of
|
| 19 |
+
a face
|
| 20 |
+
- source_sentence: ' Confusing exponentiation with multiplication.'
|
|
|
|
|
|
|
| 21 |
sentences:
|
| 22 |
+
- Estimated when not appropriate
|
| 23 |
+
- Mixes up squaring and multiplying by 2 or doubling
|
| 24 |
+
- Writes the index as a digit on the end of a number
|
| 25 |
+
- source_sentence: ' Not recognizing the pattern of subtracting 4 from each term.'
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
sentences:
|
| 27 |
+
- Identifies the term-to-term rule rather than the next term in a sequence
|
| 28 |
+
- Finds the median instead of the mode
|
| 29 |
+
- Rounds incorrectly by changing multiple place values
|
| 30 |
+
- source_sentence: ' Believing that remainders are needed to divide a group into equal
|
| 31 |
+
parts, rather than factors.'
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
sentences:
|
| 33 |
+
- Does not follow the arrows through a function machine, changes the order of the
|
| 34 |
+
operations asked.
|
| 35 |
+
- When factorising into double brackets, believes the product of the constants in
|
| 36 |
+
the brackets is of the opposite sign to the constant in the expanded equation.
|
| 37 |
+
- Does not understand that factors are divisors which split the number into equal
|
| 38 |
+
groups
|
| 39 |
+
- source_sentence: ' Believing that the perimeter is divided equally among all sides
|
| 40 |
+
without considering the number of sides in the shape.'
|
| 41 |
sentences:
|
| 42 |
+
- When given the perimeter of a regular polygon, multiplies instead of divides to
|
| 43 |
+
find each side length
|
| 44 |
+
- Does not understand the value of zeros as placeholders
|
| 45 |
+
- When asked to solve simultaneous equations, believes they can just find values
|
| 46 |
+
that work in one equation
|
| 47 |
---
|
| 48 |
|
| 49 |
# SentenceTransformer based on BAAI/bge-large-en-v1.5
|
|
|
|
| 97 |
model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
|
| 98 |
# Run inference
|
| 99 |
sentences = [
|
| 100 |
+
' Believing that the perimeter is divided equally among all sides without considering the number of sides in the shape.',
|
| 101 |
+
'When given the perimeter of a regular polygon, multiplies instead of divides to find each side length',
|
| 102 |
+
'When asked to solve simultaneous equations, believes they can just find values that work in one equation',
|
| 103 |
]
|
| 104 |
embeddings = model.encode(sentences)
|
| 105 |
print(embeddings.shape)
|
|
|
|
| 155 |
|
| 156 |
* Dataset: csv
|
| 157 |
* Size: 2,442 training samples
|
| 158 |
+
* Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
|
| 159 |
* Approximate statistics based on the first 1000 samples:
|
| 160 |
+
| | PredictedMisconception | MisconceptionName |
|
| 161 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 162 |
+
| type | string | string |
|
| 163 |
+
| details | <ul><li>min: 8 tokens</li><li>mean: 17.15 tokens</li><li>max: 72 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.15 tokens</li><li>max: 40 tokens</li></ul> |
|
| 164 |
* Samples:
|
| 165 |
+
| PredictedMisconception | MisconceptionName |
|
| 166 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------|
|
| 167 |
+
| <code> Believing equilateral triangles have varying side lengths.</code> | <code>Does not know the meaning of equilateral</code> |
|
| 168 |
+
| <code> Believing that the side length of a square is the square root of the area, but incorrectly calculating it as the square of the area.</code> | <code>Confuses perimeter and area</code> |
|
| 169 |
+
| <code> The longest edge length is necessary for volume calculation in a triangular prism.</code> | <code>Finds area of one face when asked for volume</code> |
|
| 170 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 171 |
+
```json
|
| 172 |
+
{
|
| 173 |
+
"scale": 20.0,
|
| 174 |
+
"similarity_fct": "cos_sim"
|
| 175 |
+
}
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
### Evaluation Dataset
|
| 179 |
+
|
| 180 |
+
#### csv
|
| 181 |
+
|
| 182 |
+
* Dataset: csv
|
| 183 |
+
* Size: 1,928 evaluation samples
|
| 184 |
+
* Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
|
| 185 |
+
* Approximate statistics based on the first 1000 samples:
|
| 186 |
+
| | PredictedMisconception | MisconceptionName |
|
| 187 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 188 |
+
| type | string | string |
|
| 189 |
+
| details | <ul><li>min: 8 tokens</li><li>mean: 16.66 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.32 tokens</li><li>max: 40 tokens</li></ul> |
|
| 190 |
+
* Samples:
|
| 191 |
+
| PredictedMisconception | MisconceptionName |
|
| 192 |
+
|:----------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|
|
| 193 |
+
| <code> Believing the sequence's common difference is positive, leading to an incorrect nth-term formula.</code> | <code>When finding the nth term of a linear sequence, thinks the the first term is the coefficient in front of n.</code> |
|
| 194 |
+
| <code> Incorrect application of the nth term formula for integer sequences.</code> | <code>When solving an equation, uses the same operation rather than the inverse.</code> |
|
| 195 |
+
| <code> Belief that shapes with more sides have higher rotational symmetry.</code> | <code>Does not know how to find order of rotational symmetry</code> |
|
| 196 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 197 |
```json
|
| 198 |
{
|
| 199 |
"scale": 20.0,
|
|
|
|
| 205 |
#### Non-Default Hyperparameters
|
| 206 |
|
| 207 |
- `eval_strategy`: steps
|
| 208 |
+
- `per_device_train_batch_size`: 64
|
| 209 |
+
- `per_device_eval_batch_size`: 64
|
| 210 |
+
- `gradient_accumulation_steps`: 8
|
| 211 |
+
- `weight_decay`: 0.01
|
| 212 |
- `num_train_epochs`: 20
|
| 213 |
+
- `lr_scheduler_type`: cosine_with_restarts
|
| 214 |
- `warmup_ratio`: 0.1
|
| 215 |
- `fp16`: True
|
| 216 |
- `load_best_model_at_end`: True
|
| 217 |
+
- `gradient_checkpointing`: True
|
| 218 |
- `batch_sampler`: no_duplicates
|
| 219 |
|
| 220 |
#### All Hyperparameters
|
|
|
|
| 224 |
- `do_predict`: False
|
| 225 |
- `eval_strategy`: steps
|
| 226 |
- `prediction_loss_only`: True
|
| 227 |
+
- `per_device_train_batch_size`: 64
|
| 228 |
+
- `per_device_eval_batch_size`: 64
|
| 229 |
- `per_gpu_train_batch_size`: None
|
| 230 |
- `per_gpu_eval_batch_size`: None
|
| 231 |
+
- `gradient_accumulation_steps`: 8
|
| 232 |
- `eval_accumulation_steps`: None
|
| 233 |
- `torch_empty_cache_steps`: None
|
| 234 |
- `learning_rate`: 5e-05
|
| 235 |
+
- `weight_decay`: 0.01
|
| 236 |
- `adam_beta1`: 0.9
|
| 237 |
- `adam_beta2`: 0.999
|
| 238 |
- `adam_epsilon`: 1e-08
|
| 239 |
- `max_grad_norm`: 1.0
|
| 240 |
- `num_train_epochs`: 20
|
| 241 |
- `max_steps`: -1
|
| 242 |
+
- `lr_scheduler_type`: cosine_with_restarts
|
| 243 |
- `lr_scheduler_kwargs`: {}
|
| 244 |
- `warmup_ratio`: 0.1
|
| 245 |
- `warmup_steps`: 0
|
|
|
|
| 304 |
- `hub_strategy`: every_save
|
| 305 |
- `hub_private_repo`: False
|
| 306 |
- `hub_always_push`: False
|
| 307 |
+
- `gradient_checkpointing`: True
|
| 308 |
- `gradient_checkpointing_kwargs`: None
|
| 309 |
- `include_inputs_for_metrics`: False
|
| 310 |
- `eval_do_concat_batches`: True
|
|
|
|
| 335 |
</details>
|
| 336 |
|
| 337 |
### Training Logs
|
| 338 |
+
| Epoch | Step | Training Loss | loss |
|
| 339 |
+
|:----------:|:------:|:-------------:|:----------:|
|
| 340 |
+
| 0.4103 | 2 | 2.5492 | - |
|
| 341 |
+
| 0.6154 | 3 | - | 1.4112 |
|
| 342 |
+
| 0.8205 | 4 | 2.319 | - |
|
| 343 |
+
| 1.2308 | 6 | 1.7499 | 1.2462 |
|
| 344 |
+
| 1.6410 | 8 | 1.7464 | - |
|
| 345 |
+
| 1.8462 | 9 | - | 1.1584 |
|
| 346 |
+
| 2.0513 | 10 | 1.4739 | - |
|
| 347 |
+
| 2.4615 | 12 | 1.3037 | 1.0487 |
|
| 348 |
+
| 2.8718 | 14 | 1.2155 | - |
|
| 349 |
+
| 3.0769 | 15 | - | 1.0078 |
|
| 350 |
+
| 3.2821 | 16 | 0.9292 | - |
|
| 351 |
+
| 3.6923 | 18 | 0.8923 | 0.9539 |
|
| 352 |
+
| 4.1026 | 20 | 0.7312 | - |
|
| 353 |
+
| **4.3077** | **21** | **-** | **0.9079** |
|
| 354 |
+
| 4.5128 | 22 | 0.6182 | - |
|
| 355 |
+
| 4.9231 | 24 | 0.5942 | 0.9088 |
|
| 356 |
+
| 5.3333 | 26 | 0.4158 | - |
|
| 357 |
+
| 5.5385 | 27 | - | 0.9095 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 358 |
|
| 359 |
* The bold row denotes the saved checkpoint.
|
| 360 |
|
| 361 |
### Framework Versions
|
| 362 |
+
- Python: 3.10.12
|
| 363 |
- Sentence Transformers: 3.1.1
|
| 364 |
+
- Transformers: 4.44.2
|
| 365 |
+
- PyTorch: 2.4.1+cu121
|
| 366 |
+
- Accelerate: 0.34.2
|
| 367 |
- Datasets: 2.19.2
|
| 368 |
- Tokenizers: 0.19.1
|
| 369 |
|
|
|
|
| 384 |
}
|
| 385 |
```
|
| 386 |
|
| 387 |
+
#### MultipleNegativesRankingLoss
|
| 388 |
+
```bibtex
|
| 389 |
+
@misc{henderson2017efficient,
|
| 390 |
+
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
| 391 |
+
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
| 392 |
+
year={2017},
|
| 393 |
+
eprint={1705.00652},
|
| 394 |
+
archivePrefix={arXiv},
|
| 395 |
+
primaryClass={cs.CL}
|
| 396 |
+
}
|
| 397 |
+
```
|
| 398 |
+
|
| 399 |
<!--
|
| 400 |
## Glossary
|
| 401 |
|
config.json
CHANGED
|
@@ -25,7 +25,7 @@
|
|
| 25 |
"pad_token_id": 0,
|
| 26 |
"position_embedding_type": "absolute",
|
| 27 |
"torch_dtype": "float32",
|
| 28 |
-
"transformers_version": "4.44.
|
| 29 |
"type_vocab_size": 2,
|
| 30 |
"use_cache": true,
|
| 31 |
"vocab_size": 30522
|
|
|
|
| 25 |
"pad_token_id": 0,
|
| 26 |
"position_embedding_type": "absolute",
|
| 27 |
"torch_dtype": "float32",
|
| 28 |
+
"transformers_version": "4.44.2",
|
| 29 |
"type_vocab_size": 2,
|
| 30 |
"use_cache": true,
|
| 31 |
"vocab_size": 30522
|
config_sentence_transformers.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
{
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "3.1.1",
|
| 4 |
-
"transformers": "4.44.
|
| 5 |
-
"pytorch": "2.4.
|
| 6 |
},
|
| 7 |
"prompts": {},
|
| 8 |
"default_prompt_name": null,
|
|
|
|
| 1 |
{
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "3.1.1",
|
| 4 |
+
"transformers": "4.44.2",
|
| 5 |
+
"pytorch": "2.4.1+cu121"
|
| 6 |
},
|
| 7 |
"prompts": {},
|
| 8 |
"default_prompt_name": null,
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1340612432
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:43cd7df34b025417b63ee647fd91159e5fe741ebcc584907ed2b6533605a8703
|
| 3 |
size 1340612432
|