metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:25743
- loss:MultipleNegativesRankingLoss
base_model: Lajavaness/bilingual-embedding-small
widget:
- source_sentence: >-
Luciano Hang da HAVAN, send 200 oxygen cylinders to Manaus in their
planes. attitude of man of true, bravo! O
sentences:
- The photo of Pedro Sánchez «enjoying while Gran Canaria burns»
- Havan owner Luciano Hang donated 200 oxygen cylinders to Manaus
- Video of the show in Shanghai staged by robots made in China
- source_sentence: '"PERSEVERANCE" SEND THE FIRST COLOR IMAGES FROM THE SURFACE OF MARS'
sentences:
- If an election has 51% of the votes cast, the election is annulled.
- >-
This video shows Indian Air Force attack helicopters flying over Pangong
Lake in Ladakh.
- The first video images of Mars from the Perseverance rover
- source_sentence: >-
SPEECH BY PEDRO CASTILLO, IT WAS BASED ON THE HATE OF SPAIN OF A PAST
PRE-HISPANIC THAT I ONLY KNOW EXPLAINS FROM THE MOST ABSOLUTE IGNORANCE
AND STUPIDITY" KING OF SPAINIn fact, between the president of Colombia,
Duque, and the king of Spain, the most regretful of having come to the
inauguration of the clown with a hat is the latter.
sentences:
- >-
Felipe VI said that Pedro Castillo's speech is explained from ignorance
and stupidity
- >-
"Population poorly tolerated quarantine and social distancing measures
during the Spanish flu, when the first deconfinement took place,
abandoning all precautionary measures"
- >-
Genuine photo of Philippine lawmaker Sarah Elago supporting mandatory
military training in schools
- source_sentence: >-
Australia Day has nothing to do with Captain Cook or Botany Bay The
Landing of Captain Cook at the site of Sydney happened on the 28th of
April 1770 - NOT on the 26th of January 1770. The first fleet arrived in
Australia on 18 January 1788 and landed at Botany Bay on 20 January 1788.
AUSTRALIA DAY CELEBRATES THE DAY ALL AUSTRALIANS STOPPED BEING BRITISH
CITIZENS AND BECAME AUSTRALIAN CITIZENS IN 1949. Facts about Australia Day
Our Education system and the popular press is not competently advising our
children !! Twisting the truth a bit. Don't expect the media to educate
you, that's not part of their agenda. Australia Day does not celebrate the
arrival of the first fleet or the invasion of anything. The First Fleet
arrived in Botany Bay on the 18th of January. However, Captain Cook's
landing was included in Australia Day celebrations as a reminder of a
significant historical event. Since the extravagant bicentenary
celebrations of 1988, when Sydney-siders decided Captain Cook's landing
should become the focus of the Australia Day commemoration, the importance
of this date for all Australians has begun to fade. Now, a generation
later, it's all but lost. This is because our politicians and educators
have not been doing a good job promoting the day. Our politicians have not
been advertising the real reason for Australia Day, and our educators have
not been teaching our children the importance of the 26th of January to
all Australians. The media, as usual, is happy to twist the truth for the
sake of controversy. In recent years, the media has helped fan the flames
of discontent among the Aboriginal community. Many are now so offended by
what they see as a celebration of the beginning of the darkest days of
Aboriginal history, they want the date changed. Various local Councils are
seeking to remove themselves from Australia Day celebrations, even
refusing to participate in citizenship ceremonies, and calls are going out
to have Australia Day on a different day. The big question is, why has the
Government allowed this misconception to continue? Captain Cook didn't
land on the 26th of January. So changing the date of any celebration of
Captain Cook's landing would not have any impact on Australia Day, but
maybe it would clear the way for the truth about Australia Day. The
reality is, the Aborigines in this country suffered under the hands of
British colonialism. This is as much Australia's history as the landing of
the first fleet, and both should be remembered, equally. Both should be
taught, side by side, in our schools. Australians of today reject what was
done under British governance to the Aborigines. We reject what was done
under British governance to the Irish and many other cultures around the
world. So, after the horrors of WWII, we decided to fix it. We became our
own people. On the 26th of January 1949, the Australian nationality came
into existence when the Nationality and Citizenship Act 1948 was enacted.
That was the day we were first called Australians and allowed to travel
with Passports as Australians. Under the Nationality Act 1920 (Cth), all
Aborigines and Torres Strait Islanders born after January 1, 1921, gained
the status of British subjects. In 1949, therefore, they automatically
became Australian citizens under the Nationality and Citizenship Act 1948.
Before that special date, all people living in Australia, including
Aborigines born after 1921, were called 'British Subjects' and forced to
travel on British Passports and fight in British wars. We all became
Australians on the same day! This is why we celebrate Australia Day on the
26th of January! This was the day Australians became free to make our own
decisions about which wars we would fight and how our citizens would be
treated. It was the day Aborigines were declared Australians. Until this
date, Aborigines were not protected by law. For the first time since
Cook's landing, this new Act gave Aboriginal Australians by inference and
precedent the full protection of Australian law. Because of this Act, the
government became free to help Aborigines, and since that day much has
been done to assist Aboriginal Australians, including saying 'sorry' for
the previous atrocities done before this law came into being. This was a
great day for all Australians! This is why the 26th of January is the day
new Australians receive their citizenship. It is a day which celebrates
the implementation of the Nationality and Citizenship Act of 1948 - the
Act which gave freedom and protection to the first Australians and gives
all Australians, old and new, the right to live under the protection of
Australian Law, united as one nation. Now, isn't that cause for
celebration? Education is key! There is a great need for education on the
real reason we celebrate Australia Day on the 26th of January. This reason
needs to be advertised and taught in schools. We all need to remember this
one very special day in Australia's history, when freedom came to all
Australians. What was achieved that day is something for which all
Australians can be proud! We need to remember both the good and the bad in
our history, but the emphasis must be the freedom and unity all
Australians now have, because of what was done on the 26th of January
1949, to allow all of us to live without fear in a land of peace. Isn't it
time all Australians were taught the real reason we celebrate Australia
Day on Jan 26th?
sentences:
- >-
Australia Day is commemorated on the day when Australian citizenship law
passed
- >-
Sri Lankan Defense Secretary Kamal Gunarathne praised ex-minister Rishad
Bathiudeen in speech
- The corona virus is possibly caused due to the use of rhino horn
- source_sentence: This is how the buildings moved this morning in Mexico City
sentences:
- The video captures the earthquake of June 23, 2020
- Photo shows bombing in Gaza in January 2022
- Photo shows former South African president Jacob Zuma in prison
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on Lajavaness/bilingual-embedding-small
This is a sentence-transformers model finetuned from Lajavaness/bilingual-embedding-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Lajavaness/bilingual-embedding-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'This is how the buildings moved this morning in Mexico City',
'The video captures the earthquake of June 23, 2020',
'Photo shows bombing in Gaza in January 2022',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 25,743 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 2 tokens
- mean: 110.17 tokens
- max: 512 tokens
- min: 6 tokens
- mean: 19.45 tokens
- max: 190 tokens
- min: 1.0
- mean: 1.0
- max: 1.0
- Samples:
sentence_0 sentence_1 label best music k.m KOSE CELLIE HINS GUINOT SKIN CARE KWhat people fear most is not being physically disabled, but giving up on themselves. There are still many beautiful things in life to aspire to! This stunning performance, known as the American spirit, brought tears to the eyes of 10,000 spectators. Male dancer Babo has been blind since childhood due to a fire in his home. In order to protect him, his mother held him tightly in her arms and jumped from the 7th floor. The mother died as a result, and the little baby became blind due to bleeding from the fundus. His mother was an ice skater before he died, and Babo also had a soft spot for ice skating. Although he couldn't see anything, he still pursued dance enthusiastically. He danced the famous tango "La Cumparsita" with his partner at the World Figure Skating Championships in Helsinki! 1. His ears are like bats that can measure the sound and distance around him. 2. The female dancer is very amazing. She danced with him and led him for...
Performance by a blind American ice dancer
1.0
Photo from 2016. "Good" times when health was "fine" and the press did not report anything about. Bunch of Hypocrites...Let's go fight my people... . left right not army above all
Photo of a hospital in 2016. Good times when health was "good" and the press didn't report anything about it
1.0
Haifa Oh Tel Aviv-Yafo Oh N WEST BANK Jerusalem is GAZA STRIPE Be'er Sheva Israel 65 65 35 35 15 M5 10 40Google and Apple maps have officially removed Palestine from the World Maps. Today Palestine was erased from the maps tomorrow Palestine will be erased from the world. PUT PALESTINE BACK ON THE MAP. Please unite now Pakistanio. Enemy is very strong if we are divided. Think just about Pakistan. Support each other, support Pakistan and support your leadership.
Google and Apple removed Palestine from its maps
1.0
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 8per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.1554 | 500 | 0.1021 |
0.3108 | 1000 | 0.0732 |
0.4661 | 1500 | 0.0781 |
0.6215 | 2000 | 0.0762 |
0.7769 | 2500 | 0.0763 |
0.9323 | 3000 | 0.0739 |
0.1554 | 500 | 0.0474 |
0.3108 | 1000 | 0.0478 |
0.4661 | 1500 | 0.0558 |
0.6215 | 2000 | 0.0542 |
0.7769 | 2500 | 0.0457 |
0.9323 | 3000 | 0.0395 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}