Add new SentenceTransformer model.
Browse files- README.md +82 -93
- config_sentence_transformers.json +1 -1
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -7,54 +7,50 @@ tags:
|
|
| 7 |
- sentence-similarity
|
| 8 |
- feature-extraction
|
| 9 |
- generated_from_trainer
|
| 10 |
-
- dataset_size:
|
| 11 |
- loss:MultipleNegativesSymmetricRankingLoss
|
| 12 |
widget:
|
| 13 |
-
- source_sentence:
|
| 14 |
-
|
| 15 |
-
outside of the object The triangle is enlarged by scale factor 3, with the centre
|
| 16 |
-
of enlargement at (1,0). What are the new coordinates of the point marked T ?
|
| 17 |
-
![A coordinate grid with the x-axis going from -1 to 10 and the y-axis going from
|
| 18 |
-
-1 to 7. 3 points are plotted and joined with straight lines to form a triangle.
|
| 19 |
-
The points are (1,1), (1,4) and (3,1). Point (3,1) is labelled as T. Point (1,0)
|
| 20 |
-
is also plotted.]() (9,3)
|
| 21 |
sentences:
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
A
|
| 28 |
-
|
| 29 |
-
1'
|
| 30 |
sentences:
|
| 31 |
-
-
|
| 32 |
-
|
| 33 |
-
-
|
| 34 |
-
|
| 35 |
-
-
|
|
|
|
|
|
|
|
|
|
| 36 |
sentences:
|
| 37 |
-
-
|
| 38 |
-
|
| 39 |
-
-
|
| 40 |
-
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
sentences:
|
| 43 |
-
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
-
|
| 47 |
-
- source_sentence:
|
| 48 |
-
|
| 49 |
-
![Two rectangles of different sizes. One rectangle has width 2cm and height 3cm.
|
| 50 |
-
The other rectangle has width 4cm and height 9cm. ]() Katie says these two rectangles
|
| 51 |
-
are similar ![Two rectangles of different sizes. One rectangle has width 4cm and
|
| 52 |
-
height 6cm. The other rectangle has width 7cm and height 9cm. ]() Only Katie
|
| 53 |
sentences:
|
| 54 |
-
-
|
| 55 |
-
-
|
| 56 |
-
|
| 57 |
-
-
|
| 58 |
---
|
| 59 |
|
| 60 |
# SentenceTransformer based on BAAI/bge-large-en-v1.5
|
|
@@ -108,9 +104,9 @@ from sentence_transformers import SentenceTransformer
|
|
| 108 |
model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
|
| 109 |
# Run inference
|
| 110 |
sentences = [
|
| 111 |
-
'
|
| 112 |
-
'Thinks
|
| 113 |
-
|
| 114 |
]
|
| 115 |
embeddings = model.encode(sentences)
|
| 116 |
print(embeddings.shape)
|
|
@@ -165,19 +161,19 @@ You can finetune this model on your own dataset.
|
|
| 165 |
#### csv
|
| 166 |
|
| 167 |
* Dataset: csv
|
| 168 |
-
* Size: 2,
|
| 169 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
| 170 |
* Approximate statistics based on the first 1000 samples:
|
| 171 |
| | sentence1 | sentence2 |
|
| 172 |
|:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 173 |
| type | string | string |
|
| 174 |
-
| details | <ul><li>min: 13 tokens</li><li>mean: 56.
|
| 175 |
* Samples:
|
| 176 |
-
| sentence1
|
| 177 |
-
|
| 178 |
-
| <code>
|
| 179 |
-
| <code>
|
| 180 |
-
| <code>
|
| 181 |
* Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
|
| 182 |
```json
|
| 183 |
{
|
|
@@ -193,6 +189,7 @@ You can finetune this model on your own dataset.
|
|
| 193 |
- `per_device_train_batch_size`: 16
|
| 194 |
- `per_device_eval_batch_size`: 16
|
| 195 |
- `num_train_epochs`: 20
|
|
|
|
| 196 |
- `fp16`: True
|
| 197 |
- `load_best_model_at_end`: True
|
| 198 |
- `batch_sampler`: no_duplicates
|
|
@@ -221,7 +218,7 @@ You can finetune this model on your own dataset.
|
|
| 221 |
- `max_steps`: -1
|
| 222 |
- `lr_scheduler_type`: linear
|
| 223 |
- `lr_scheduler_kwargs`: {}
|
| 224 |
-
- `warmup_ratio`: 0.
|
| 225 |
- `warmup_steps`: 0
|
| 226 |
- `log_level`: passive
|
| 227 |
- `log_level_replica`: warning
|
|
@@ -315,52 +312,44 @@ You can finetune this model on your own dataset.
|
|
| 315 |
</details>
|
| 316 |
|
| 317 |
### Training Logs
|
| 318 |
-
| Epoch
|
| 319 |
-
|
| 320 |
-
| 0.
|
| 321 |
-
| 0.
|
| 322 |
-
|
|
| 323 |
-
| 1.
|
| 324 |
-
| 1.
|
| 325 |
-
|
|
| 326 |
-
|
|
| 327 |
-
|
|
| 328 |
-
|
|
| 329 |
-
|
|
| 330 |
-
|
|
| 331 |
-
|
|
| 332 |
-
|
|
| 333 |
-
|
|
| 334 |
-
|
|
| 335 |
-
|
|
| 336 |
-
|
|
| 337 |
-
|
|
| 338 |
-
|
|
| 339 |
-
|
|
| 340 |
-
|
|
| 341 |
-
|
|
| 342 |
-
|
|
| 343 |
-
|
|
| 344 |
-
|
|
| 345 |
-
|
|
| 346 |
-
|
|
| 347 |
-
|
|
| 348 |
-
|
|
| 349 |
-
|
|
| 350 |
-
| 7.75 | 713 | 0.0605 |
|
| 351 |
-
| **8.0** | **736** | **0.0431** |
|
| 352 |
-
| 8.25 | 759 | 0.0224 |
|
| 353 |
-
| 8.5 | 782 | 0.0381 |
|
| 354 |
-
| 8.75 | 805 | 0.0451 |
|
| 355 |
-
| 9.0 | 828 | 0.0169 |
|
| 356 |
-
| 9.25 | 851 | 0.0228 |
|
| 357 |
-
| 9.5 | 874 | 0.0257 |
|
| 358 |
|
| 359 |
* The bold row denotes the saved checkpoint.
|
| 360 |
|
| 361 |
### Framework Versions
|
| 362 |
- Python: 3.10.14
|
| 363 |
-
- Sentence Transformers: 3.1.
|
| 364 |
- Transformers: 4.44.0
|
| 365 |
- PyTorch: 2.4.0
|
| 366 |
- Accelerate: 0.33.0
|
|
|
|
| 7 |
- sentence-similarity
|
| 8 |
- feature-extraction
|
| 9 |
- generated_from_trainer
|
| 10 |
+
- dataset_size:2442
|
| 11 |
- loss:MultipleNegativesSymmetricRankingLoss
|
| 12 |
widget:
|
| 13 |
+
- source_sentence: Carry out a subtraction problem with positive integers where the
|
| 14 |
+
answer is less than 0 598-1000= This problem cannot be solved
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
sentences:
|
| 16 |
+
- Rounds to the wrong degree of accuracy (rounds too much)
|
| 17 |
+
- When subtracting fractions, subtracts the numerators and denominators
|
| 18 |
+
- Believes it is impossible to subtract a bigger number from a smaller number
|
| 19 |
+
- source_sentence: Given the sketch of a curve in the form (x + a)(x + b), work out
|
| 20 |
+
its factorised form Which of the following could be the equation of this curve?
|
| 21 |
+
![A graph of a quadratic curve that crosses the x axis at (1,0) and (3,0) and
|
| 22 |
+
crosses the y axis at (0,3).]() y=(x+1)(x+3)
|
|
|
|
| 23 |
sentences:
|
| 24 |
+
- Does not use the associative property of multiplication to find other factors
|
| 25 |
+
of a number
|
| 26 |
+
- Believes they only need to multiply the first and last pairs of terms when expanding
|
| 27 |
+
double brackets
|
| 28 |
+
- Forgets to swap the sign of roots when placing into brackets
|
| 29 |
+
- source_sentence: For a given output find the input of a function machine ![Image
|
| 30 |
+
of a function machine. The function is add one third, and the output is 7]() What
|
| 31 |
+
is the input of this function machine? 7 1/3
|
| 32 |
sentences:
|
| 33 |
+
- When finding an input of a function machine thinks you apply the operations given
|
| 34 |
+
rather than the inverse operation.
|
| 35 |
+
- Believes the solution to mx + c = a is the y intercept of y = mx +c
|
| 36 |
+
- Squares when asked to find the square root
|
| 37 |
+
- source_sentence: Count a number of objects 1,3,5,7, … ? Which pattern matches the
|
| 38 |
+
sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot.
|
| 39 |
+
The second pattern is green dots arranged in a 2 by 2 square shape. The third
|
| 40 |
+
pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is
|
| 41 |
+
green dots arranged in a 4 by 4 square shape. ]()
|
| 42 |
sentences:
|
| 43 |
+
- 'Subtracts instead of adds when answering worded problems '
|
| 44 |
+
- When multiplying a decimal less than 1 by an integer, gives an answer 10 times
|
| 45 |
+
smaller than it should be
|
| 46 |
+
- When given a linear sequence, cannot match it to a visual pattern
|
| 47 |
+
- source_sentence: Express one quantity as a fraction of another A group of 8 friends
|
| 48 |
+
share £6 equally. What fraction of the money do they each get? 1/8
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
sentences:
|
| 50 |
+
- Thinks the fraction 1/n can express sharing any number of items between n people
|
| 51 |
+
- 'Does not understand that in the ratio 1:n the total number of parts would be
|
| 52 |
+
1+n '
|
| 53 |
+
- Does not recognise the distributive property
|
| 54 |
---
|
| 55 |
|
| 56 |
# SentenceTransformer based on BAAI/bge-large-en-v1.5
|
|
|
|
| 104 |
model = SentenceTransformer("Gurveer05/bge-large-eedi-2024")
|
| 105 |
# Run inference
|
| 106 |
sentences = [
|
| 107 |
+
'Express one quantity as a fraction of another A group of 8 friends share £6 equally. What fraction of the money do they each get? 1/8',
|
| 108 |
+
'Thinks the fraction 1/n can express sharing any number of items between n people',
|
| 109 |
+
'Does not recognise the distributive property',
|
| 110 |
]
|
| 111 |
embeddings = model.encode(sentences)
|
| 112 |
print(embeddings.shape)
|
|
|
|
| 161 |
#### csv
|
| 162 |
|
| 163 |
* Dataset: csv
|
| 164 |
+
* Size: 2,442 training samples
|
| 165 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
| 166 |
* Approximate statistics based on the first 1000 samples:
|
| 167 |
| | sentence1 | sentence2 |
|
| 168 |
|:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 169 |
| type | string | string |
|
| 170 |
+
| details | <ul><li>min: 13 tokens</li><li>mean: 56.55 tokens</li><li>max: 306 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.13 tokens</li><li>max: 40 tokens</li></ul> |
|
| 171 |
* Samples:
|
| 172 |
+
| sentence1 | sentence2 |
|
| 173 |
+
|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------|
|
| 174 |
+
| <code>Calculate the distance travelled using a speed-time graph Here is a speed-time graph for a car. Which of the following gives the best estimate for the distance travelled between 8 and 10 seconds? ![A graph showing time in seconds on the x axis and speed in metres per second on the y axis. The curve passes through the points (8,15) and (10,24)]() 48 m</code> | <code>Believes that when finding area under graph you can use the upper y value rather than average of upper and lower</code> |
|
| 175 |
+
| <code>Add proper fractions with the same denominator Work out: 4/11+7/11 Write your answer in its simplest form. 11/11</code> | <code>Forgot to simplify the fraction</code> |
|
| 176 |
+
| <code>Count a number of objects 1,3,5,7, … ? Which pattern matches the sequence above? ![A sequence of 4 patterns. The first pattern is 1 green dot. The second pattern is green dots arranged in a 2 by 2 square shape. The third pattern is green dots arranged in a 3 by 3 square shape. The fourth pattern is green dots arranged in a 4 by 4 square shape. ]()</code> | <code>When given a linear sequence, cannot match it to a visual pattern</code> |
|
| 177 |
* Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
|
| 178 |
```json
|
| 179 |
{
|
|
|
|
| 189 |
- `per_device_train_batch_size`: 16
|
| 190 |
- `per_device_eval_batch_size`: 16
|
| 191 |
- `num_train_epochs`: 20
|
| 192 |
+
- `warmup_ratio`: 0.1
|
| 193 |
- `fp16`: True
|
| 194 |
- `load_best_model_at_end`: True
|
| 195 |
- `batch_sampler`: no_duplicates
|
|
|
|
| 218 |
- `max_steps`: -1
|
| 219 |
- `lr_scheduler_type`: linear
|
| 220 |
- `lr_scheduler_kwargs`: {}
|
| 221 |
+
- `warmup_ratio`: 0.1
|
| 222 |
- `warmup_steps`: 0
|
| 223 |
- `log_level`: passive
|
| 224 |
- `log_level_replica`: warning
|
|
|
|
| 312 |
</details>
|
| 313 |
|
| 314 |
### Training Logs
|
| 315 |
+
| Epoch | Step | Training Loss |
|
| 316 |
+
|:---------:|:-------:|:-------------:|
|
| 317 |
+
| 0.3766 | 29 | 1.4411 |
|
| 318 |
+
| 0.7532 | 58 | 1.0084 |
|
| 319 |
+
| 1.1299 | 87 | 0.7363 |
|
| 320 |
+
| 1.5065 | 116 | 0.5658 |
|
| 321 |
+
| 1.8831 | 145 | 0.4697 |
|
| 322 |
+
| 2.2597 | 174 | 0.307 |
|
| 323 |
+
| 2.6364 | 203 | 0.2828 |
|
| 324 |
+
| 3.0130 | 232 | 0.1616 |
|
| 325 |
+
| 3.3896 | 261 | 0.1542 |
|
| 326 |
+
| 3.7662 | 290 | 0.1315 |
|
| 327 |
+
| 4.1429 | 319 | 0.0984 |
|
| 328 |
+
| 4.5195 | 348 | 0.1066 |
|
| 329 |
+
| 4.8961 | 377 | 0.0768 |
|
| 330 |
+
| 5.2727 | 406 | 0.0641 |
|
| 331 |
+
| 5.6494 | 435 | 0.0558 |
|
| 332 |
+
| 6.0260 | 464 | 0.0495 |
|
| 333 |
+
| 6.4026 | 493 | 0.0459 |
|
| 334 |
+
| 6.7792 | 522 | 0.0397 |
|
| 335 |
+
| 7.1558 | 551 | 0.0255 |
|
| 336 |
+
| 7.5325 | 580 | 0.0278 |
|
| 337 |
+
| 7.9091 | 609 | 0.0237 |
|
| 338 |
+
| 8.2857 | 638 | 0.0238 |
|
| 339 |
+
| 8.6623 | 667 | 0.0248 |
|
| 340 |
+
| **9.039** | **696** | **0.0158** |
|
| 341 |
+
| 9.4156 | 725 | 0.0176 |
|
| 342 |
+
| 9.7922 | 754 | 0.017 |
|
| 343 |
+
| 10.1688 | 783 | 0.0116 |
|
| 344 |
+
| 10.5455 | 812 | 0.0192 |
|
| 345 |
+
| 10.9221 | 841 | 0.0076 |
|
| 346 |
+
| 11.2987 | 870 | 0.009 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 347 |
|
| 348 |
* The bold row denotes the saved checkpoint.
|
| 349 |
|
| 350 |
### Framework Versions
|
| 351 |
- Python: 3.10.14
|
| 352 |
+
- Sentence Transformers: 3.1.1
|
| 353 |
- Transformers: 4.44.0
|
| 354 |
- PyTorch: 2.4.0
|
| 355 |
- Accelerate: 0.33.0
|
config_sentence_transformers.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"__version__": {
|
| 3 |
-
"sentence_transformers": "3.1.
|
| 4 |
"transformers": "4.44.0",
|
| 5 |
"pytorch": "2.4.0"
|
| 6 |
},
|
|
|
|
| 1 |
{
|
| 2 |
"__version__": {
|
| 3 |
+
"sentence_transformers": "3.1.1",
|
| 4 |
"transformers": "4.44.0",
|
| 5 |
"pytorch": "2.4.0"
|
| 6 |
},
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1340612432
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0a05fe01c79e9d58438063e8a0f24a4341a0671378aaa11eee7fa7a304ce60e5
|
| 3 |
size 1340612432
|