am-azadi's picture
Upload folder using huggingface_hub
32b81e4 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:25743
  - loss:MultipleNegativesRankingLoss
base_model: Lajavaness/bilingual-embedding-small
widget:
  - source_sentence: >-
      Luciano Hang da HAVAN, send 200 oxygen cylinders to Manaus in their
      planes. attitude of man of true, bravo! O
    sentences:
      - The photo of Pedro Sánchez «enjoying while Gran Canaria burns»
      - Havan owner Luciano Hang donated 200 oxygen cylinders to Manaus
      - Video of the show in Shanghai staged by robots made in China
  - source_sentence: '"PERSEVERANCE" SEND THE FIRST COLOR IMAGES FROM THE SURFACE OF MARS'
    sentences:
      - If an election has 51% of the votes cast, the election is annulled.
      - >-
        This video shows Indian Air Force attack helicopters flying over Pangong
        Lake in Ladakh.
      - The first video images of Mars from the Perseverance rover
  - source_sentence: >-
      SPEECH BY PEDRO CASTILLO, IT WAS BASED ON THE HATE OF SPAIN OF A PAST
      PRE-HISPANIC THAT I ONLY KNOW EXPLAINS FROM THE MOST ABSOLUTE IGNORANCE
      AND STUPIDITY" KING OF SPAINIn fact, between the president of Colombia,
      Duque, and the king of Spain, the most regretful of having come to the
      inauguration of the clown with a hat is the latter.
    sentences:
      - >-
        Felipe VI said that Pedro Castillo's speech is explained from ignorance
        and stupidity
      - >-
        "Population poorly tolerated quarantine and social distancing measures
        during the Spanish flu, when the first deconfinement took place,
        abandoning all precautionary measures"
      - >-
        Genuine photo of Philippine lawmaker Sarah Elago supporting mandatory
        military training in schools
  - source_sentence: >-
      Australia Day has nothing to do with Captain Cook or Botany Bay The
      Landing of Captain Cook at the site of Sydney happened on the 28th of
      April 1770 - NOT on the 26th of January 1770. The first fleet arrived in
      Australia on 18 January 1788 and landed at Botany Bay on 20 January 1788.
      AUSTRALIA DAY CELEBRATES THE DAY ALL AUSTRALIANS STOPPED BEING BRITISH
      CITIZENS AND BECAME AUSTRALIAN CITIZENS IN 1949. Facts about Australia Day
      Our Education system and the popular press is not competently advising our
      children !! Twisting the truth a bit. Don't expect the media to educate
      you, that's not part of their agenda. Australia Day does not celebrate the
      arrival of the first fleet or the invasion of anything. The First Fleet
      arrived in Botany Bay on the 18th of January. However, Captain Cook's
      landing was included in Australia Day celebrations as a reminder of a
      significant historical event. Since the extravagant bicentenary
      celebrations of 1988, when Sydney-siders decided Captain Cook's landing
      should become the focus of the Australia Day commemoration, the importance
      of this date for all Australians has begun to fade. Now, a generation
      later, it's all but lost. This is because our politicians and educators
      have not been doing a good job promoting the day. Our politicians have not
      been advertising the real reason for Australia Day, and our educators have
      not been teaching our children the importance of the 26th of January to
      all Australians. The media, as usual, is happy to twist the truth for the
      sake of controversy. In recent years, the media has helped fan the flames
      of discontent among the Aboriginal community. Many are now so offended by
      what they see as a celebration of the beginning of the darkest days of
      Aboriginal history, they want the date changed. Various local Councils are
      seeking to remove themselves from Australia Day celebrations, even
      refusing to participate in citizenship ceremonies, and calls are going out
      to have Australia Day on a different day. The big question is, why has the
      Government allowed this misconception to continue? Captain Cook didn't
      land on the 26th of January. So changing the date of any celebration of
      Captain Cook's landing would not have any impact on Australia Day, but
      maybe it would clear the way for the truth about Australia Day. The
      reality is, the Aborigines in this country suffered under the hands of
      British colonialism. This is as much Australia's history as the landing of
      the first fleet, and both should be remembered, equally. Both should be
      taught, side by side, in our schools. Australians of today reject what was
      done under British governance to the Aborigines. We reject what was done
      under British governance to the Irish and many other cultures around the
      world. So, after the horrors of WWII, we decided to fix it. We became our
      own people. On the 26th of January 1949, the Australian nationality came
      into existence when the Nationality and Citizenship Act 1948 was enacted.
      That was the day we were first called Australians and allowed to travel
      with Passports as Australians. Under the Nationality Act 1920 (Cth), all
      Aborigines and Torres Strait Islanders born after January 1, 1921, gained
      the status of British subjects. In 1949, therefore, they automatically
      became Australian citizens under the Nationality and Citizenship Act 1948.
      Before that special date, all people living in Australia, including
      Aborigines born after 1921, were called 'British Subjects' and forced to
      travel on British Passports and fight in British wars. We all became
      Australians on the same day! This is why we celebrate Australia Day on the
      26th of January! This was the day Australians became free to make our own
      decisions about which wars we would fight and how our citizens would be
      treated. It was the day Aborigines were declared Australians. Until this
      date, Aborigines were not protected by law. For the first time since
      Cook's landing, this new Act gave Aboriginal Australians by inference and
      precedent the full protection of Australian law. Because of this Act, the
      government became free to help Aborigines, and since that day much has
      been done to assist Aboriginal Australians, including saying 'sorry' for
      the previous atrocities done before this law came into being. This was a
      great day for all Australians! This is why the 26th of January is the day
      new Australians receive their citizenship. It is a day which celebrates
      the implementation of the Nationality and Citizenship Act of 1948 - the
      Act which gave freedom and protection to the first Australians and gives
      all Australians, old and new, the right to live under the protection of
      Australian Law, united as one nation. Now, isn't that cause for
      celebration? Education is key! There is a great need for education on the
      real reason we celebrate Australia Day on the 26th of January. This reason
      needs to be advertised and taught in schools. We all need to remember this
      one very special day in Australia's history, when freedom came to all
      Australians. What was achieved that day is something for which all
      Australians can be proud! We need to remember both the good and the bad in
      our history, but the emphasis must be the freedom and unity all
      Australians now have, because of what was done on the 26th of January
      1949, to allow all of us to live without fear in a land of peace. Isn't it
      time all Australians were taught the real reason we celebrate Australia
      Day on Jan 26th?
    sentences:
      - >-
        Australia Day is commemorated on the day when Australian citizenship law
        passed
      - >-
        Sri Lankan Defense Secretary Kamal Gunarathne praised ex-minister Rishad
        Bathiudeen in speech
      - The corona virus is possibly caused due to the use of rhino horn
  - source_sentence: This is how the buildings moved this morning in Mexico City
    sentences:
      - The video captures the earthquake of June 23, 2020
      - Photo shows bombing in Gaza in January 2022
      - Photo shows former South African president Jacob Zuma in prison
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on Lajavaness/bilingual-embedding-small

This is a sentence-transformers model finetuned from Lajavaness/bilingual-embedding-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Lajavaness/bilingual-embedding-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'This is how the buildings moved this morning in Mexico City',
    'The video captures the earthquake of June 23, 2020',
    'Photo shows bombing in Gaza in January 2022',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 25,743 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 2 tokens
    • mean: 110.17 tokens
    • max: 512 tokens
    • min: 6 tokens
    • mean: 19.45 tokens
    • max: 190 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    best music k.m KOSE CELLIE HINS GUINOT SKIN CARE KWhat people fear most is not being physically disabled, but giving up on themselves. There are still many beautiful things in life to aspire to! This stunning performance, known as the American spirit, brought tears to the eyes of 10,000 spectators. Male dancer Babo has been blind since childhood due to a fire in his home. In order to protect him, his mother held him tightly in her arms and jumped from the 7th floor. The mother died as a result, and the little baby became blind due to bleeding from the fundus. His mother was an ice skater before he died, and Babo also had a soft spot for ice skating. Although he couldn't see anything, he still pursued dance enthusiastically. He danced the famous tango "La Cumparsita" with his partner at the World Figure Skating Championships in Helsinki! 1. His ears are like bats that can measure the sound and distance around him. 2. The female dancer is very amazing. She danced with him and led him for... Performance by a blind American ice dancer 1.0
    Photo from 2016. "Good" times when health was "fine" and the press did not report anything about. Bunch of Hypocrites...Let's go fight my people... . left right not army above all Photo of a hospital in 2016. Good times when health was "good" and the press didn't report anything about it 1.0
    Haifa Oh Tel Aviv-Yafo Oh N WEST BANK Jerusalem is GAZA STRIPE Be'er Sheva Israel 65 65 35 35 15 M5 10 40Google and Apple maps have officially removed Palestine from the World Maps. Today Palestine was erased from the maps tomorrow Palestine will be erased from the world. PUT PALESTINE BACK ON THE MAP. Please unite now Pakistanio. Enemy is very strong if we are divided. Think just about Pakistan. Support each other, support Pakistan and support your leadership. Google and Apple removed Palestine from its maps 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1554 500 0.1021
0.3108 1000 0.0732
0.4661 1500 0.0781
0.6215 2000 0.0762
0.7769 2500 0.0763
0.9323 3000 0.0739
0.1554 500 0.0474
0.3108 1000 0.0478
0.4661 1500 0.0558
0.6215 2000 0.0542
0.7769 2500 0.0457
0.9323 3000 0.0395

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}