File size: 59,498 Bytes
a7f195f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 |
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:21196
- loss:DenoisingAutoEncoderLoss
base_model: google-bert/bert-base-uncased
widget:
- source_sentence: Oldham also has Fencing, Marshall Club is a Competitive of its
competing on stage . They train times old South High.
sentences:
- Several leading batsmen opposed the new law , including the professional Herbert
<unk> , known as an exponent of pad @-@ play , and amateurs Errol Holmes and Bob
Wyatt . Wisden <unk> ' <unk> noted that these three improved their batting records
during the 1935 season , but batsmen generally were less successful . There were
also fewer drawn matches . There was an increase in the number of lbws — out of
1 @,@ 560 lbw dismissals in first @-@ class matches in 1935 , 483 were given under
the amended law . Wisden judged the experiment a success and several of its opponents
changed their mind by the end of the season ; batsmen soon became accustomed to
the alteration . Although Australian authorities were less convinced , and did
not immediately introduce the revision into domestic first @-@ class cricket ,
in 1937 the new rule became part of the Laws of Cricket .
- Oldham also has a Fencing Club , Marshall Fencing Club is a Competitive Fencing
Club with most of its members competing on the national stage . They train three
times a week at the old South Chadderton High School .
- While overseeing an initial <unk> of American involvement in the Vietnam War ,
he subsequently ended U.S. involvement in 1973 , and eliminated the draft . <unk>
, his administration generally embraced policies that transferred power from Washington
to the states . Among other things , he initiated wars on cancer and drugs , imposed
wage and price controls , enforced <unk> of Southern schools and established the
Environmental Protection Agency . Though he presided over Apollo 11 and the subsequent
lunar landings , he later scaled back manned space exploration . In 1972 , he
was reelected by a landslide , the largest to that date . The Watergate scandal
, which would consume the larger part of his second term , resulted in his ultimate
resignation on August 4 , 1974 .
- source_sentence: Randy contributed tracks on the album, was the honest record's
made . 't care thought of the lyrics . They were only important her "Carey of
Def Leppard's song "Bringin Heartbreak . During the photo shoot for Charmbracelet,
Carey happened to Def Leppard's Vault (1995), which contains the song and decided
to cover In an interview with, Carey said the song is "an of her musical diversity
. Jackson on "My Saving ", which Carey said describes thoughts about and process
. While Capri Carey became ill cancer and she returned to New York to spend time
he after In his, Carey and produced the song Sunflowers Carey song represents
"his of the family is kind of hard to talk about . "be "for, and she sang it only
in studio DJ <unk> songs for the, but of them were
sentences:
- = = Taxonomy and phylogeny = =
- 'Outside Japan , Destiny 2 was released in China and South Korea by Sony Computer
Entertainment in 2003 : the Korean version was released on March 27 , and the
Chinese version was released on August 14 . The game was going to be part of a
world tour by Sony Computer Entertainment to promote the next generation of role
@-@ playing games , but the tensions between America and Iraq at the time and
the consequent risks of a terrorist attack caused them to cancel the trip . Asked
at the launch event whether an overseas version of the game was being developed
, producer Makoto <unk> said he was " not certain " . The PSP port was released
in South Korea by Namco Bandai Games ''s local branch on March 5 , 2007 . Neither
the original nor the port has been released in the west , making it one of three
mainline Tales titles to remain exclusive to Japan .'
- Randy Jackson contributed to four tracks on the album , and said it was " the
most real and honest record she 's made . She didn 't care what anyone thought
of the lyrics . They were only important to her . " Carey included a cover of
Def Leppard 's song " Bringin ' On the Heartbreak " . During the photo shoot for
Charmbracelet at Capri , Carey happened to listen to Def Leppard 's album Vault
( 1995 ) , which contains the song , and decided to cover it . In an interview
with Billboard , Carey said that the song is " an example of her musical diversity
" . Jackson also worked on " My Saving Grace " , which Carey said describes her
thoughts about the writing , recording and mastering process . While working in
Capri , Carey 's father became ill with cancer and she returned to New York to
spend some time with him ; he died soon after . In his memory , Carey wrote and
produced the song " Sunflowers for Alfred Roy " . Carey said that the song represents
" his side of the family and is kind of hard to talk about . " The song proved
to be " very emotional " for Carey , and she sang it only once in the studio .
DJ <unk> also produced songs for the album , but none of them were included .
- source_sentence: = = = Allied =
sentences:
- = = = Allied planning = = =
- = = = English colonists = = =
- Germany advocated quick recognition of Croatia , stating that it wanted to stop
ongoing violence in Serb @-@ inhabited areas . It was opposed by France , the
United Kingdom , and the Netherlands , but the countries agreed to pursue a common
approach and avoid unilateral actions . On 10 October , two days after the Croatian
Parliament confirmed the declaration of independence , the EEC decided to postpone
any decision to recognize Croatia for two months , deciding to recognize Croatian
independence in two months if the war had not ended by then . As the deadline
expired , Germany presented its decision to recognize Croatia as its policy and
duty — a position supported by Italy and Denmark . France and the UK attempted
to prevent the recognition by drafting a United Nations resolution requesting
no unilateral actions which could <unk> the situation , but backed down during
the Security Council debate on 14 December , when Germany appeared determined
to defy the UN resolution . On 17 December , the EEC formally agreed to grant
Croatia diplomatic recognition on 15 January 1992 , relying on opinion of the
Badinter <unk> Committee . The Committee ruled that Croatia 's independence should
not be recognized immediately , because the new Croatian Constitution did not
provide protection of minorities required by the EEC . In response , the President
Franjo Tuđman gave written <unk> to Robert Badinter that the deficit would be
<unk> . The <unk> formally declared its separation from Croatia on 19 December
, but its statehood and independence were not recognized internationally . On
26 December , Yugoslav authorities announced plans for a smaller state , which
could include the territory captured from Croatia , but the plan was rejected
by the UN General Assembly .
- source_sentence: During the night, the German and force Katia and was Oghratina
when Division to the Zealand Rifles Brigades and 5th Mounted Brigade were Oghratina
Despite by brigades to the enemy, they were forced to make a on strongly <unk>
positions which carefully artillery Meanwhile, the divisions Katia Abu Hamra and
Lawrence moved his forward from Kantara to The 3rd Light Brigade the right towards
<unk, but could make small progress, positions securely held by German and.
sentences:
- State Route 243 ( SR 243 ) , or the Banning @-@ Idyllwild <unk> Highway , is a
30 @-@ mile ( 50 kilometer ) two @-@ lane highway that runs from Banning , California
( in the north ) to Idyllwild , California ( in the south ) in Riverside County
, California . The highway is a connector between Interstate 10 ( I @-@ 10 ) and
SR 74 . Along its route , it provides access to the San <unk> National Forest
. A road from Banning to Idyllwild was planned around the turn of the twentieth
century , and was open by 1910 . The road was added to the state highway system
in 1970 .
- During the previous night , the German and Ottoman force evacuated Katia and was
moving towards Oghratina when Chauvel ordered the Anzac Mounted Division to continue
the attack . The New Zealand Mounted Rifles Brigades and the 5th Mounted Brigade
were ordered to capture Oghratina . Despite attempts by these two brigades to
turn the enemy flank , they were forced to make a frontal attack on strongly entrenched
<unk> in positions which favoured the defenders and which were supported by carefully
positioned artillery . Meanwhile , the two infantry divisions moved to garrison
Katia and Abu Hamra and Lawrence moved his headquarters forward from Kantara to
Romani . The 3rd Light Horse Brigade on the right advanced towards <unk> , but
could only make small progress , against positions securely held by German and
Ottoman forces .
- The current training ground is located at Bodymoor Heath near <unk> in north Warwickshire
, the site for which was purchased by former chairman Doug Ellis in the early
1970s from a local farmer . Although Bodymoor Heath was state @-@ of @-@ the @-@
art in the 1970s , by the late 1990s the facilities had started to look dated
. In November 2005 , Ellis and Aston Villa <unk> announced a state of the art
GB £ 13 million redevelopment of Bodymoor in two phases . However , work on Bodymoor
was suspended by Ellis due to financial problems , and was left in an unfinished
state until new owner Randy Lerner made it one of his priorities to make the site
one of the best in world football . The new training ground was officially unveiled
on 6 May 2007 , by then manager Martin O 'Neill , then team captain Gareth Barry
and 1982 European Cup winning team captain Dennis Mortimer , with the Aston Villa
squad moving in for the 2007 – 08 season .
- source_sentence: album five @ -, in an with Billboard magazine, said it was previously
"something I wanted to revisit as been doing a while . "The medley a written whereas
McCartney had worked the Beatles' was made of "bits we had knocking . "The off
with Vintage "McCartney sat one to looking back [and looking back . about life
followed by the bass @ - @ led That Was Me, which is his school days and ",, "from
there . songs "Feet the Clouds "about the inactivity while is up of ", about the
life being a celebrity The final song medley, The End of ", written McCartney's
unk> playing on his, Jim's piano
sentences:
- Severe Tropical Storm Domoina in 1984 caused 100 year floods in South Africa and
record rainfall in Swaziland . The fourth named storm of the season , Domoina
developed on January 16 off the northeast coast of Madagascar . With a ridge to
the north , the storm tracked generally westward and later southwestward . On
January 21 , Domoina struck eastern Madagascar , the third storm in six weeks
to affect the nation ; collectively , the storms caused 42 deaths and $ 25 million
in damage ( 1984 USD ) . After crossing the country , Domoina strengthened in
the Mozambique Channel to peak 10 minute sustained winds of 95 km / h ( 60 mph
) . On January 28 , the storm made landfall in southern Mozambique , and slowly
weakened over land . Domoina crossed into Swaziland and later eastern South Africa
before dissipating on February 2 .
- The album features a five song @-@ medley , which in an interview with Billboard
magazine , McCartney said that it was previously " something I wanted to revisit
" as " nobody had been doing that for a while . " The medley was a group of intentionally
written material , whereas McCartney had worked on the Beatles ' Abbey Road which
, however , was actually made up of " bits we had knocking around . " The medley
starts off with " Vintage Clothes " , which McCartney " sat down one day " to
write , that was " looking back , [ and ] looking back . " , about life . It was
followed by the bass @-@ led " That Was Me " , which is about his " school days
and teachers " , the medley , as McCartney stated , then " progressed from there
. " The next songs are " Feet in the Clouds " , about the inactivity while one
is growing up , and " House of Wax " , about the life of being a celebrity . The
final song in medley , " The End of the End " , was written at McCartney 's <unk>
Avenue home while playing on his father , Jim 's , piano .
- Varanasi grew as an important industrial centre , famous for its muslin and silk
<unk> , perfumes , ivory works , and sculpture . Buddha is believed to have founded
Buddhism here around <unk> BC when he gave his first sermon , " The Setting in
Motion of the Wheel of Dharma " , at nearby <unk> . The city 's religious importance
continued to grow in the 8th century , when Adi <unk> established the worship
of Shiva as an official sect of Varanasi . Despite the Muslim rule , Varanasi
remained the centre of activity for Hindu intellectuals and theologians during
the Middle Ages , which further contributed to its reputation as a cultural centre
of religion and education . <unk> Tulsidas wrote his epic poem on Lord Rama 's
life called Ram <unk> Manas in Varanasi . Several other major figures of the Bhakti
movement were born in Varanasi , including Kabir and Ravidas . Guru Nanak Dev
visited Varanasi for <unk> in <unk> , a trip that played a large role in the founding
of <unk> . In the 16th century , Varanasi experienced a cultural revival under
the Muslim Mughal emperor <unk> who invested in the city , and built two large
temples dedicated to Shiva and Vishnu , though much of modern Varanasi was built
during the 18th century , by the Maratha and <unk> kings . The kingdom of Benares
was given official status by the <unk> in 1737 , and continued as a dynasty @-@
governed area until Indian independence in 1947 . The city is governed by the
Varanasi Nagar Nigam ( Municipal Corporation ) and is represented in the Parliament
of India by the current Prime Minister of India <unk> <unk> , who won the <unk>
<unk> elections in 2014 by a huge margin . Silk weaving , carpets and crafts and
tourism employ a significant number of the local population , as do the <unk>
<unk> Works and Bharat Heavy <unk> Limited . Varanasi Hospital was established
in 1964 .
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer based on google-bert/bert-base-uncased
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.6552233601802461
name: Pearson Cosine
- type: spearman_cosine
value: 0.6640796604094039
name: Spearman Cosine
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.7355355958065635
name: Pearson Cosine
- type: spearman_cosine
value: 0.7320302276487962
name: Spearman Cosine
---
# SentenceTransformer based on google-bert/bert-base-uncased
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) <!-- at revision 86b5e0934494bd15c9632b12f734a8a67f723594 -->
- **Maximum Sequence Length:** 75 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 75, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tartspuppy/bert-base-uncased-tsdae-encoder")
# Run inference
sentences = [
'album five @ -, in an with Billboard magazine, said it was previously "something I wanted to revisit as been doing a while . "The medley a written whereas McCartney had worked the Beatles\' was made of "bits we had knocking . "The off with Vintage "McCartney sat one to looking back [and looking back . about life followed by the bass @ - @ led That Was Me, which is his school days and ",, "from there . songs "Feet the Clouds "about the inactivity while is up of ", about the life being a celebrity The final song medley, The End of ", written McCartney\'s unk> playing on his, Jim\'s piano',
'The album features a five song @-@ medley , which in an interview with Billboard magazine , McCartney said that it was previously " something I wanted to revisit " as " nobody had been doing that for a while . " The medley was a group of intentionally written material , whereas McCartney had worked on the Beatles \' Abbey Road which , however , was actually made up of " bits we had knocking around . " The medley starts off with " Vintage Clothes " , which McCartney " sat down one day " to write , that was " looking back , [ and ] looking back . " , about life . It was followed by the bass @-@ led " That Was Me " , which is about his " school days and teachers " , the medley , as McCartney stated , then " progressed from there . " The next songs are " Feet in the Clouds " , about the inactivity while one is growing up , and " House of Wax " , about the life of being a celebrity . The final song in medley , " The End of the End " , was written at McCartney \'s <unk> Avenue home while playing on his father , Jim \'s , piano .',
'Varanasi grew as an important industrial centre , famous for its muslin and silk <unk> , perfumes , ivory works , and sculpture . Buddha is believed to have founded Buddhism here around <unk> BC when he gave his first sermon , " The Setting in Motion of the Wheel of Dharma " , at nearby <unk> . The city \'s religious importance continued to grow in the 8th century , when Adi <unk> established the worship of Shiva as an official sect of Varanasi . Despite the Muslim rule , Varanasi remained the centre of activity for Hindu intellectuals and theologians during the Middle Ages , which further contributed to its reputation as a cultural centre of religion and education . <unk> Tulsidas wrote his epic poem on Lord Rama \'s life called Ram <unk> Manas in Varanasi . Several other major figures of the Bhakti movement were born in Varanasi , including Kabir and Ravidas . Guru Nanak Dev visited Varanasi for <unk> in <unk> , a trip that played a large role in the founding of <unk> . In the 16th century , Varanasi experienced a cultural revival under the Muslim Mughal emperor <unk> who invested in the city , and built two large temples dedicated to Shiva and Vishnu , though much of modern Varanasi was built during the 18th century , by the Maratha and <unk> kings . The kingdom of Benares was given official status by the <unk> in 1737 , and continued as a dynasty @-@ governed area until Indian independence in 1947 . The city is governed by the Varanasi Nagar Nigam ( Municipal Corporation ) and is represented in the Parliament of India by the current Prime Minister of India <unk> <unk> , who won the <unk> <unk> elections in 2014 by a huge margin . Silk weaving , carpets and crafts and tourism employ a significant number of the local population , as do the <unk> <unk> Works and Bharat Heavy <unk> Limited . Varanasi Hospital was established in 1964 .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
## Evaluation
### Metrics
#### Semantic Similarity
* Datasets: `sts-dev` and `sts-test`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | sts-dev | sts-test |
|:--------------------|:-----------|:----------|
| pearson_cosine | 0.6552 | 0.7355 |
| **spearman_cosine** | **0.6641** | **0.732** |
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 21,196 training samples
* Columns: <code>text</code>
* Approximate statistics based on the first 1000 samples:
| | text |
|:--------|:----------------------------------------------------------------------------------|
| type | string |
| details | <ul><li>min: 6 tokens</li><li>mean: 51.01 tokens</li><li>max: 75 tokens</li></ul> |
* Samples:
| text |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>To promote the album , Carey announced a world tour in April 2003 . As of 2003 , " Charmbracelet World Tour : An Intimate Evening with Mariah Carey " was her most extensive tour , lasting over eight months and performing sixty @-@ nine shows in venues worldwide . Before tickets went on sale in the US , venues were switched from large arenas to smaller , more intimate theater shows . According to Carey , the change was made in order to give fans a more intimate show , and something more Broadway @-@ influenced . She said , " It 's much more intimate so you 'll feel like you had an experience . You experience a night with me . " However , while smaller productions were booked for the US leg of the tour , Carey performed at stadia and arenas in Asia and Europe , and performed for a crowd of over 35 @,@ 000 in Manila , 50 @,@ 000 in Malaysia , and to over 70 @,@ 000 people in China . In the UK , it was Carey 's first tour to feature shows outside London ; she performed in Glasgow , Birming...</code> |
| <code>By 1916 , these raiding forces were causing serious concern in the Admiralty as the proximity of Bruges to the British coast , to the troopship lanes across the English Channel and for the U @-@ boats , to the Western Approaches ; the heaviest shipping lanes in the World at the time . In the late spring of 1915 , Admiral Reginald <unk> had attempted without success to destroy the lock gates at Ostend with monitors . This effort failed , and Bruges became increasingly important in the Atlantic Campaign , which reached its height in 1917 . By early 1918 , the Admiralty was seeking ever more radical solutions to the problems raised by unrestricted submarine warfare , including instructing the " Allied Naval and Marine Forces " department to plan attacks on U @-@ boat bases in Belgium .</code> |
| <code>PWI International Heavyweight Championship ( 1 time )</code> |
* Loss: [<code>DenoisingAutoEncoderLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Evaluation Dataset
#### Unnamed Dataset
* Size: 2,355 evaluation samples
* Columns: <code>text</code>
* Approximate statistics based on the first 1000 samples:
| | text |
|:--------|:----------------------------------------------------------------------------------|
| type | string |
| details | <ul><li>min: 4 tokens</li><li>mean: 51.08 tokens</li><li>max: 75 tokens</li></ul> |
* Samples:
| text |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>Wilde 's two final comedies , An Ideal Husband and The Importance of Being Earnest , were still on stage in London at the time of his prosecution , and they were soon closed as the details of his case became public . After two years in prison with hard labour , Wilde went into exile in Paris , sick and depressed , his reputation destroyed in England . In 1898 , when no @-@ one else would , Leonard Smithers agreed with Wilde to publish the two final plays . Wilde proved to be a <unk> <unk> , sending detailed instructions on stage directions , character listings and the presentation of the book , and insisting that a <unk> from the first performance be reproduced inside . Ellmann argues that the proofs show a man " very much in command of himself and of the play " . Wilde 's name did not appear on the cover , it was " By the Author of Lady Windermere 's Fan " . His return to work was brief though , as he refused to write anything else , " I can write , but have lost the joy of writing " ...</code> |
| <code>= = = = Ely Viaduct = = = =</code> |
| <code>= = World War I = =</code> |
* Loss: [<code>DenoisingAutoEncoderLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 64
- `per_device_eval_batch_size`: 64
- `learning_rate`: 3e-05
- `num_train_epochs`: 100
- `warmup_ratio`: 0.1
- `fp16`: True
- `dataloader_num_workers`: 2
- `load_best_model_at_end`: True
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 64
- `per_device_eval_batch_size`: 64
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 3e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 100
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 2
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional
</details>
### Training Logs
<details><summary>Click to expand</summary>
| Epoch | Step | Training Loss | Validation Loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|:-----------:|:--------:|:-------------:|:---------------:|:-----------------------:|:------------------------:|
| -1 | -1 | - | - | 0.3173 | - |
| 0.6024 | 100 | 8.2676 | - | - | - |
| 1.2048 | 200 | 6.0396 | - | - | - |
| 1.8072 | 300 | 4.7794 | - | - | - |
| 2.4096 | 400 | 4.2732 | - | - | - |
| 3.0120 | 500 | 3.9759 | - | - | - |
| 3.6145 | 600 | 3.7263 | - | - | - |
| 4.2169 | 700 | 3.5471 | - | - | - |
| 4.8193 | 800 | 3.4097 | - | - | - |
| 5.4217 | 900 | 3.2513 | - | - | - |
| 6.0241 | 1000 | 3.1646 | 3.3052 | 0.7232 | - |
| 6.6265 | 1100 | 3.0129 | - | - | - |
| 7.2289 | 1200 | 2.9307 | - | - | - |
| 7.8313 | 1300 | 2.8372 | - | - | - |
| 8.4337 | 1400 | 2.7232 | - | - | - |
| 9.0361 | 1500 | 2.6845 | - | - | - |
| 9.6386 | 1600 | 2.546 | - | - | - |
| 10.2410 | 1700 | 2.4931 | - | - | - |
| 10.8434 | 1800 | 2.4064 | - | - | - |
| 11.4458 | 1900 | 2.3145 | - | - | - |
| 12.0482 | 2000 | 2.2715 | 3.1490 | 0.7177 | - |
| 12.6506 | 2100 | 2.1495 | - | - | - |
| 13.2530 | 2200 | 2.1164 | - | - | - |
| 13.8554 | 2300 | 2.0398 | - | - | - |
| 14.4578 | 2400 | 1.9538 | - | - | - |
| 15.0602 | 2500 | 1.9311 | - | - | - |
| 15.6627 | 2600 | 1.8264 | - | - | - |
| 16.2651 | 2700 | 1.7786 | - | - | - |
| 16.8675 | 2800 | 1.7256 | - | - | - |
| 17.4699 | 2900 | 1.6395 | - | - | - |
| 18.0723 | 3000 | 1.6082 | 3.4656 | 0.6894 | - |
| 18.6747 | 3100 | 1.5152 | - | - | - |
| 19.2771 | 3200 | 1.4678 | - | - | - |
| 19.8795 | 3300 | 1.425 | - | - | - |
| 20.4819 | 3400 | 1.3395 | - | - | - |
| 21.0843 | 3500 | 1.3203 | - | - | - |
| 21.6867 | 3600 | 1.2275 | - | - | - |
| 22.2892 | 3700 | 1.1955 | - | - | - |
| 22.8916 | 3800 | 1.1612 | - | - | - |
| 23.4940 | 3900 | 1.0792 | - | - | - |
| 24.0964 | 4000 | 1.0557 | 3.9473 | 0.6822 | - |
| 24.6988 | 4100 | 0.9793 | - | - | - |
| 25.3012 | 4200 | 0.9516 | - | - | - |
| 25.9036 | 4300 | 0.9095 | - | - | - |
| 26.5060 | 4400 | 0.8408 | - | - | - |
| 27.1084 | 4500 | 0.8338 | - | - | - |
| 27.7108 | 4600 | 0.7713 | - | - | - |
| 28.3133 | 4700 | 0.8312 | - | - | - |
| 28.9157 | 4800 | 0.8437 | - | - | - |
| 29.5181 | 4900 | 0.6952 | - | - | - |
| 30.1205 | 5000 | 0.6825 | 4.3702 | 0.6671 | - |
| 30.7229 | 5100 | 1.7624 | - | - | - |
| 31.3253 | 5200 | 6.9439 | - | - | - |
| 31.9277 | 5300 | 6.2218 | - | - | - |
| 32.5301 | 5400 | 5.9866 | - | - | - |
| 33.1325 | 5500 | 5.8608 | - | - | - |
| 33.7349 | 5600 | 5.7661 | - | - | - |
| 34.3373 | 5700 | 5.7114 | - | - | - |
| 34.9398 | 5800 | 5.6526 | - | - | - |
| 35.5422 | 5900 | 5.5982 | - | - | - |
| **36.1446** | **6000** | **5.5632** | **5.6696** | **0.7876** | **-** |
| 36.7470 | 6100 | 5.5455 | - | - | - |
| 37.3494 | 6200 | 5.4853 | - | - | - |
| 37.9518 | 6300 | 5.4709 | - | - | - |
| 38.5542 | 6400 | 5.4372 | - | - | - |
| 39.1566 | 6500 | 5.405 | - | - | - |
| 39.7590 | 6600 | 5.4011 | - | - | - |
| 40.3614 | 6700 | 5.3779 | - | - | - |
| 40.9639 | 6800 | 5.3684 | - | - | - |
| 41.5663 | 6900 | 5.3462 | - | - | - |
| 42.1687 | 7000 | 5.335 | 5.5090 | 0.7515 | - |
| 42.7711 | 7100 | 5.3273 | - | - | - |
| 43.3735 | 7200 | 5.3078 | - | - | - |
| 43.9759 | 7300 | 5.3005 | - | - | - |
| 44.5783 | 7400 | 5.2836 | - | - | - |
| 45.1807 | 7500 | 5.2732 | - | - | - |
| 45.7831 | 7600 | 5.2707 | - | - | - |
| 46.3855 | 7700 | 5.2525 | - | - | - |
| 46.9880 | 7800 | 5.2439 | - | - | - |
| 47.5904 | 7900 | 5.2316 | - | - | - |
| 48.1928 | 8000 | 5.2121 | 5.4451 | 0.7316 | - |
| 48.7952 | 8100 | 5.2142 | - | - | - |
| 49.3976 | 8200 | 5.1939 | - | - | - |
| 50.0 | 8300 | 5.186 | - | - | - |
| 50.6024 | 8400 | 5.166 | - | - | - |
| 51.2048 | 8500 | 5.1727 | - | - | - |
| 51.8072 | 8600 | 5.1555 | - | - | - |
| 52.4096 | 8700 | 5.1538 | - | - | - |
| 53.0120 | 8800 | 5.1413 | - | - | - |
| 53.6145 | 8900 | 5.1343 | - | - | - |
| 54.2169 | 9000 | 5.1257 | 5.3939 | 0.7142 | - |
| 54.8193 | 9100 | 5.1183 | - | - | - |
| 55.4217 | 9200 | 5.116 | - | - | - |
| 56.0241 | 9300 | 5.0999 | - | - | - |
| 56.6265 | 9400 | 5.0922 | - | - | - |
| 57.2289 | 9500 | 5.0756 | - | - | - |
| 57.8313 | 9600 | 5.0792 | - | - | - |
| 58.4337 | 9700 | 5.061 | - | - | - |
| 59.0361 | 9800 | 5.0663 | - | - | - |
| 59.6386 | 9900 | 5.0493 | - | - | - |
| 60.2410 | 10000 | 5.0487 | 5.3613 | 0.7019 | - |
| 60.8434 | 10100 | 5.0462 | - | - | - |
| 61.4458 | 10200 | 5.0356 | - | - | - |
| 62.0482 | 10300 | 5.0379 | - | - | - |
| 62.6506 | 10400 | 5.0243 | - | - | - |
| 63.2530 | 10500 | 5.0091 | - | - | - |
| 63.8554 | 10600 | 5.0128 | - | - | - |
| 64.4578 | 10700 | 5.0099 | - | - | - |
| 65.0602 | 10800 | 5.0078 | - | - | - |
| 65.6627 | 10900 | 4.9965 | - | - | - |
| 66.2651 | 11000 | 4.9907 | 5.3310 | 0.6963 | - |
| 66.8675 | 11100 | 4.9918 | - | - | - |
| 67.4699 | 11200 | 4.9724 | - | - | - |
| 68.0723 | 11300 | 4.984 | - | - | - |
| 68.6747 | 11400 | 4.9689 | - | - | - |
| 69.2771 | 11500 | 4.9636 | - | - | - |
| 69.8795 | 11600 | 4.9622 | - | - | - |
| 70.4819 | 11700 | 4.9547 | - | - | - |
| 71.0843 | 11800 | 4.9527 | - | - | - |
| 71.6867 | 11900 | 4.9467 | - | - | - |
| 72.2892 | 12000 | 4.9397 | 5.3186 | 0.6832 | - |
| 72.8916 | 12100 | 4.9387 | - | - | - |
| 73.4940 | 12200 | 4.9299 | - | - | - |
| 74.0964 | 12300 | 4.9454 | - | - | - |
| 74.6988 | 12400 | 4.9267 | - | - | - |
| 75.3012 | 12500 | 4.9258 | - | - | - |
| 75.9036 | 12600 | 4.9244 | - | - | - |
| 76.5060 | 12700 | 4.9214 | - | - | - |
| 77.1084 | 12800 | 4.9125 | - | - | - |
| 77.7108 | 12900 | 4.9122 | - | - | - |
| 78.3133 | 13000 | 4.9108 | 5.3026 | 0.6840 | - |
| 78.9157 | 13100 | 4.9073 | - | - | - |
| 79.5181 | 13200 | 4.8944 | - | - | - |
| 80.1205 | 13300 | 4.8987 | - | - | - |
| 80.7229 | 13400 | 4.9013 | - | - | - |
| 81.3253 | 13500 | 4.8915 | - | - | - |
| 81.9277 | 13600 | 4.8883 | - | - | - |
| 82.5301 | 13700 | 4.8861 | - | - | - |
| 83.1325 | 13800 | 4.882 | - | - | - |
| 83.7349 | 13900 | 4.8812 | - | - | - |
| 84.3373 | 14000 | 4.8805 | 5.2968 | 0.6695 | - |
| 84.9398 | 14100 | 4.8839 | - | - | - |
| 85.5422 | 14200 | 4.8747 | - | - | - |
| 86.1446 | 14300 | 4.8652 | - | - | - |
| 86.7470 | 14400 | 4.8734 | - | - | - |
| 87.3494 | 14500 | 4.872 | - | - | - |
| 87.9518 | 14600 | 4.8621 | - | - | - |
| 88.5542 | 14700 | 4.8599 | - | - | - |
| 89.1566 | 14800 | 4.8649 | - | - | - |
| 89.7590 | 14900 | 4.8621 | - | - | - |
| 90.3614 | 15000 | 4.8483 | 5.2860 | 0.6694 | - |
| 90.9639 | 15100 | 4.8538 | - | - | - |
| 91.5663 | 15200 | 4.86 | - | - | - |
| 92.1687 | 15300 | 4.8463 | - | - | - |
| 92.7711 | 15400 | 4.8582 | - | - | - |
| 93.3735 | 15500 | 4.8444 | - | - | - |
| 93.9759 | 15600 | 4.8482 | - | - | - |
| 94.5783 | 15700 | 4.848 | - | - | - |
| 95.1807 | 15800 | 4.8489 | - | - | - |
| 95.7831 | 15900 | 4.8403 | - | - | - |
| 96.3855 | 16000 | 4.8425 | 5.2828 | 0.6641 | - |
| 96.9880 | 16100 | 4.8423 | - | - | - |
| 97.5904 | 16200 | 4.8377 | - | - | - |
| 98.1928 | 16300 | 4.8448 | - | - | - |
| 98.7952 | 16400 | 4.8384 | - | - | - |
| 99.3976 | 16500 | 4.8381 | - | - | - |
| 100.0 | 16600 | 4.8389 | - | - | - |
| -1 | -1 | - | - | - | 0.7320 |
* The bold row denotes the saved checkpoint.
</details>
### Framework Versions
- Python: 3.12.9
- Sentence Transformers: 4.0.1
- Transformers: 4.50.1
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.4.1
- Tokenizers: 0.21.1
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### DenoisingAutoEncoderLoss
```bibtex
@inproceedings{wang-2021-TSDAE,
title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
pages = "671--688",
url = "https://arxiv.org/abs/2104.06979",
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--> |