Gurveer05
/

bge-large-eedi-2024

@@ -8,49 +8,42 @@ tags:
 - feature-extraction
 - generated_from_trainer
 - dataset_size:2442
-- loss:MultipleNegativesSymmetricRankingLoss
 widget:
-- source_sentence: Carry out a subtraction problem with positive integers where the
-    answer is less than 0 598-1000= This problem cannot be solved
   sentences:
-  - Rounds to the wrong degree of accuracy (rounds too much)
-  - When subtracting fractions, subtracts the numerators and denominators
-  - Believes it is impossible to subtract a bigger number from a smaller number
-- source_sentence: Given the sketch of a curve in the form (x + a)(x + b), work out
-    its factorised form Which of the following could be the equation of this curve?
-    ![A graph of a quadratic curve that crosses the x axis at (1,0) and (3,0) and
-    crosses the y axis at (0,3).]() y=(x+1)(x+3)
   sentences:
-  - Does not use the associative property of multiplication to find other factors
-    of a number
-  - Believes they only need to multiply the first and last pairs of terms when expanding
-    double brackets
-  - Forgets to swap the sign of roots when placing into brackets
-- source_sentence: For a given output find the input of a function machine ![Image
-    of a function machine. The function is add one third, and the output is 7]() What
-    is the input of this function machine? 7 1/3
   sentences:
-  - When finding an input of a function machine thinks you apply the operations given
-    rather than the inverse operation.
-  - Believes the solution to mx + c = a is the y intercept of y = mx +c
-  - Squares when asked to find the square root
-- source_sentence: Count a number of objects 1,3,5,7, … ? Which pattern matches the
-    sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot.
-    The second pattern is green dots arranged in a 2 by 2 square shape. The third
-    pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is
-    green dots arranged in a 4 by 4 square shape. ]()
   sentences:
-  - 'Subtracts instead of adds when answering worded problems '
-  - When multiplying a decimal less than 1 by an integer, gives an answer 10 times
-    smaller than it should be
-  - When given a linear sequence, cannot match it to a visual pattern
-- source_sentence: Express one quantity as a fraction of another A group of 8 friends
-    share £6 equally. What fraction of the money do they each get? 1/8
   sentences:
-  - Thinks the fraction 1/n can express sharing any number of items between n people
-  - 'Does not understand that in the ratio 1:n the total number of parts would be
-    1+n '
-  - Does not recognise the distributive property
 ---
 # SentenceTransformer based on BAAI/bge-large-en-v1.5
@@ -104,9 +97,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
 # Run inference
 sentences = [
-    'Express one quantity as a fraction of another A group of 8 friends share £6 equally. What fraction of the money do they each get? 1/8',
-    'Thinks the fraction 1/n can express sharing any number of items between n people',
-    'Does not recognise the distributive property',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -162,19 +155,45 @@ You can finetune this model on your own dataset.
 * Dataset: csv
 * Size: 2,442 training samples
-* Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence1                                                                           | sentence2                                                                         |
-  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                              | string                                                                            |
-  | details | <ul><li>min: 13 tokens</li><li>mean: 56.55 tokens</li><li>max: 306 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.13 tokens</li><li>max: 40 tokens</li></ul> |
 * Samples:
-  | sentence1                                                                                                                                                                                                                                                                                                                                                                     | sentence2                                                                                                                     |
-  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------|
-  | <code>Calculate the distance travelled using a speed-time graph Here is a speed-time graph for a car. Which of the following gives the best estimate for the distance travelled between 8 and 10 seconds? ![A graph showing time in seconds on the x axis and speed in metres per second on the y axis. The curve passes through the points (8,15) and (10,24)]() 48 m</code> | <code>Believes that when finding area under graph you can use the upper y value rather than average of upper and lower</code> |
-  | <code>Add proper fractions with the same denominator Work out: 4/11+7/11 Write your answer in its simplest form. 11/11</code>                                                                                                                                                                                                                                                 | <code>Forgot to simplify the fraction</code>                                                                                  |
-  | <code>Count a number of objects 1,3,5,7, … ? Which pattern matches the sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot. The second pattern is green dots arranged in a 2 by 2 square shape. The third pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is green dots arranged in a 4 by 4 square shape. ]()</code>    | <code>When given a linear sequence, cannot match it to a visual pattern</code>                                                |
-* Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
   ```json
   {
       "scale": 20.0,
@@ -186,12 +205,16 @@ You can finetune this model on your own dataset.
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
-- `per_device_train_batch_size`: 16
-- `per_device_eval_batch_size`: 16
 - `num_train_epochs`: 20
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `load_best_model_at_end`: True
 - `batch_sampler`: no_duplicates
 #### All Hyperparameters
@@ -201,22 +224,22 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 16
-- `per_device_eval_batch_size`: 16
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
 - `learning_rate`: 5e-05
-- `weight_decay`: 0.0
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
 - `num_train_epochs`: 20
 - `max_steps`: -1
-- `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 - `warmup_ratio`: 0.1
 - `warmup_steps`: 0
@@ -281,7 +304,7 @@ You can finetune this model on your own dataset.
 - `hub_strategy`: every_save
 - `hub_private_repo`: False
 - `hub_always_push`: False
-- `gradient_checkpointing`: False
 - `gradient_checkpointing_kwargs`: None
 - `include_inputs_for_metrics`: False
 - `eval_do_concat_batches`: True
@@ -312,47 +335,35 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch     | Step    | Training Loss |
-|:---------:|:-------:|:-------------:|
-| 0.3766    | 29      | 1.4411        |
-| 0.7532    | 58      | 1.0084        |
-| 1.1299    | 87      | 0.7363        |
-| 1.5065    | 116     | 0.5658        |
-| 1.8831    | 145     | 0.4697        |
-| 2.2597    | 174     | 0.307         |
-| 2.6364    | 203     | 0.2828        |
-| 3.0130    | 232     | 0.1616        |
-| 3.3896    | 261     | 0.1542        |
-| 3.7662    | 290     | 0.1315        |
-| 4.1429    | 319     | 0.0984        |
-| 4.5195    | 348     | 0.1066        |
-| 4.8961    | 377     | 0.0768        |
-| 5.2727    | 406     | 0.0641        |
-| 5.6494    | 435     | 0.0558        |
-| 6.0260    | 464     | 0.0495        |
-| 6.4026    | 493     | 0.0459        |
-| 6.7792    | 522     | 0.0397        |
-| 7.1558    | 551     | 0.0255        |
-| 7.5325    | 580     | 0.0278        |
-| 7.9091    | 609     | 0.0237        |
-| 8.2857    | 638     | 0.0238        |
-| 8.6623    | 667     | 0.0248        |
-| **9.039** | **696** | **0.0158**    |
-| 9.4156    | 725     | 0.0176        |
-| 9.7922    | 754     | 0.017         |
-| 10.1688   | 783     | 0.0116        |
-| 10.5455   | 812     | 0.0192        |
-| 10.9221   | 841     | 0.0076        |
-| 11.2987   | 870     | 0.009         |
 * The bold row denotes the saved checkpoint.
 ### Framework Versions
-- Python: 3.10.14
 - Sentence Transformers: 3.1.1
-- Transformers: 4.44.0
-- PyTorch: 2.4.0
-- Accelerate: 0.33.0
 - Datasets: 2.19.2
 - Tokenizers: 0.19.1
@@ -373,6 +384,18 @@ You can finetune this model on your own dataset.
 }
 ```
 <!--
 ## Glossary

 - feature-extraction
 - generated_from_trainer
 - dataset_size:2442
+- loss:MultipleNegativesRankingLoss
 widget:
+- source_sentence: ' Confusing height with width or depth for calculating the base
+    area.'
   sentences:
+  - Does not understand the first significant value is the first non-zero digit number
+  - Does not realise that subtracting a larger number will give a smaller answer
+  - Cannot identify the correct side lengths to use when asked to find the area of
+    a face
+- source_sentence: ' Confusing exponentiation with multiplication.'
   sentences:
+  - Estimated when not appropriate
+  - Mixes up squaring and multiplying by 2 or doubling
+  - Writes the index as a digit on the end of a number
+- source_sentence: ' Not recognizing the pattern of subtracting 4 from each term.'
   sentences:
+  - Identifies the term-to-term rule rather than the next term in a sequence
+  - Finds the median instead of the mode
+  - Rounds incorrectly by changing multiple place values
+- source_sentence: ' Believing that remainders are needed to divide a group into equal
+    parts, rather than factors.'
   sentences:
+  - Does not follow the arrows through a function machine, changes the order of the
+    operations asked.
+  - When factorising into double brackets, believes the product of the constants in
+    the brackets is of the opposite sign to the constant in the expanded equation.
+  - Does not understand that factors are divisors which split the number into equal
+    groups
+- source_sentence: ' Believing that the perimeter is divided equally among all sides
+    without considering the number of sides in the shape.'
   sentences:
+  - When given the perimeter of a regular polygon, multiplies instead of divides to
+    find each side length
+  - Does not understand the value of zeros as placeholders
+  - When asked to solve simultaneous equations, believes they can just find values
+    that work in one equation
 ---
 # SentenceTransformer based on BAAI/bge-large-en-v1.5
 model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
 # Run inference
 sentences = [
+    ' Believing that the perimeter is divided equally among all sides without considering the number of sides in the shape.',
+    'When given the perimeter of a regular polygon, multiplies instead of divides to find each side length',
+    'When asked to solve simultaneous equations, believes they can just find values that work in one equation',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 * Dataset: csv
 * Size: 2,442 training samples
+* Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | PredictedMisconception                                                            | MisconceptionName                                                                 |
+  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                            |
+  | details | <ul><li>min: 8 tokens</li><li>mean: 17.15 tokens</li><li>max: 72 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.15 tokens</li><li>max: 40 tokens</li></ul> |
 * Samples:
+  | PredictedMisconception                                                                                                                             | MisconceptionName                                         |
+  |:---------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------|
+  | <code> Believing equilateral triangles have varying side lengths.</code>                                                                           | <code>Does not know the meaning of equilateral</code>     |
+  | <code> Believing that the side length of a square is the square root of the area, but incorrectly calculating it as the square of the area.</code> | <code>Confuses perimeter and area</code>                  |
+  | <code> The longest edge length is necessary for volume calculation in a triangular prism.</code>                                                   | <code>Finds area of one face when asked for volume</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Evaluation Dataset
+#### csv
+* Dataset: csv
+* Size: 1,928 evaluation samples
+* Columns: <code>PredictedMisconception</code> and <code>MisconceptionName</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | PredictedMisconception                                                            | MisconceptionName                                                                 |
+  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                            |
+  | details | <ul><li>min: 8 tokens</li><li>mean: 16.66 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.32 tokens</li><li>max: 40 tokens</li></ul> |
+* Samples:
+  | PredictedMisconception                                                                                          | MisconceptionName                                                                                                        |
+  |:----------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|
+  | <code> Believing the sequence's common difference is positive, leading to an incorrect nth-term formula.</code> | <code>When finding the nth term of a linear sequence, thinks the the first term is the coefficient in front of n.</code> |
+  | <code> Incorrect application of the nth term formula for integer sequences.</code>                              | <code>When solving an equation, uses the same operation rather than the inverse.</code>                                  |
+  | <code> Belief that shapes with more sides have higher rotational symmetry.</code>                               | <code>Does not know how to find order of rotational symmetry</code>                                                      |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
       "scale": 20.0,
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
+- `gradient_accumulation_steps`: 8
+- `weight_decay`: 0.01
 - `num_train_epochs`: 20
+- `lr_scheduler_type`: cosine_with_restarts
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `load_best_model_at_end`: True
+- `gradient_checkpointing`: True
 - `batch_sampler`: no_duplicates
 #### All Hyperparameters
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 8
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
 - `learning_rate`: 5e-05
+- `weight_decay`: 0.01
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
 - `num_train_epochs`: 20
 - `max_steps`: -1
+- `lr_scheduler_type`: cosine_with_restarts
 - `lr_scheduler_kwargs`: {}
 - `warmup_ratio`: 0.1
 - `warmup_steps`: 0
 - `hub_strategy`: every_save
 - `hub_private_repo`: False
 - `hub_always_push`: False
+- `gradient_checkpointing`: True
 - `gradient_checkpointing_kwargs`: None
 - `include_inputs_for_metrics`: False
 - `eval_do_concat_batches`: True
 </details>
 ### Training Logs
+| Epoch      | Step   | Training Loss | loss       |
+|:----------:|:------:|:-------------:|:----------:|
+| 0.4103     | 2      | 2.5492        | -          |
+| 0.6154     | 3      | -             | 1.4112     |
+| 0.8205     | 4      | 2.319         | -          |
+| 1.2308     | 6      | 1.7499        | 1.2462     |
+| 1.6410     | 8      | 1.7464        | -          |
+| 1.8462     | 9      | -             | 1.1584     |
+| 2.0513     | 10     | 1.4739        | -          |
+| 2.4615     | 12     | 1.3037        | 1.0487     |
+| 2.8718     | 14     | 1.2155        | -          |
+| 3.0769     | 15     | -             | 1.0078     |
+| 3.2821     | 16     | 0.9292        | -          |
+| 3.6923     | 18     | 0.8923        | 0.9539     |
+| 4.1026     | 20     | 0.7312        | -          |
+| **4.3077** | **21** | **-**         | **0.9079** |
+| 4.5128     | 22     | 0.6182        | -          |
+| 4.9231     | 24     | 0.5942        | 0.9088     |
+| 5.3333     | 26     | 0.4158        | -          |
+| 5.5385     | 27     | -             | 0.9095     |
 * The bold row denotes the saved checkpoint.
 ### Framework Versions
+- Python: 3.10.12
 - Sentence Transformers: 3.1.1
+- Transformers: 4.44.2
+- PyTorch: 2.4.1+cu121
+- Accelerate: 0.34.2
 - Datasets: 2.19.2
 - Tokenizers: 0.19.1
 }
 ```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
 <!--
 ## Glossary

config.json CHANGED Viewed

@@ -25,7 +25,7 @@
   "pad_token_id": 0,
   "position_embedding_type": "absolute",
   "torch_dtype": "float32",
-  "transformers_version": "4.44.0",
   "type_vocab_size": 2,
   "use_cache": true,
   "vocab_size": 30522

   "pad_token_id": 0,
   "position_embedding_type": "absolute",
   "torch_dtype": "float32",
+  "transformers_version": "4.44.2",
   "type_vocab_size": 2,
   "use_cache": true,
   "vocab_size": 30522

config_sentence_transformers.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "__version__": {
     "sentence_transformers": "3.1.1",
-    "transformers": "4.44.0",
-    "pytorch": "2.4.0"
   },
   "prompts": {},
   "default_prompt_name": null,

 {
   "__version__": {
     "sentence_transformers": "3.1.1",
+    "transformers": "4.44.2",
+    "pytorch": "2.4.1+cu121"
   },
   "prompts": {},
   "default_prompt_name": null,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0a05fe01c79e9d58438063e8a0f24a4341a0671378aaa11eee7fa7a304ce60e5
 size 1340612432

 version https://git-lfs.github.com/spec/v1
+oid sha256:43cd7df34b025417b63ee647fd91159e5fe741ebcc584907ed2b6533605a8703
 size 1340612432