roberta_TSDAE / README.md
SeppeV's picture
Add new SentenceTransformer model
ee983f9 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:8522
  - loss:DenoisingAutoEncoderLoss
base_model: sentence-transformers/all-roberta-large-v1
widget:
  - source_sentence: >-
      This . A engineer and go a trip walking the when a The physicist the
      distance of the the drop bullet his rifle fires the deer to . engineer his
      . to account for he rifle licks finger the speed and of fires deer 5 right
      . statistician "got!"
    sentences:
      - >-
        This is a mean joke.

        A physicist, an engineer, and a statistician go on a hunting trip, they
        are walking through the woods when they spot a deer in a clearing. The
        physicist calculates the distance of the target, the velocity and drop
        of the bullet, adjusts his rifle and fires, missing the deer 5 feet to
        the left. The engineer rolls his eyes. 'You forgot to account for wind.
        Give it here', he snatches the rifle, licks his finger and estimates the
        speed and direction of the wind and fires, missing the deer 5 feet to
        the right. Suddenly, the statistician claps his hands and yells "We got
        him!"
      - |-
        While driving to work, robbers jumped into my car and stole everything.
        They were pirates of the car I be in.
      - >-
        Driving and trying to read twitter, I just ran over a poodle.
        Unfortunately I drive a Yaris. My car got a dent and the poodle got
        annoyed.
  - source_sentence: ': the love?? They.'
    sentences:
      - I have a super hero joke Fantastic four
      - "Monroe: What did the trailer and the truck do after they fell in love?\nAmanda: What?\nMonroe: They got\_hitched."
      - |-
        JOSIAH: What is a lawn mower’s favorite kind of music?
        TIM: I’m not sure.
        JOSIAH: Bluegrass.
  - source_sentence: 'JAYDEN What panda ’ s: JAYDEN: Bam-BOO!'
    sentences:
      - >-
        BlackBerry and Apple have come together to create a something for ladies
        who have trouble listening. It's been called the Black-i.
      - Where do you put the Duke? In the duke box!
      - "JAYDEN: What is a panda’s favorite\_Halloween food?\nCAYDEN: What?\nJAYDEN: Bam-BOO!"
  - source_sentence: we should be the time expand language, not it instead of 'probababably
    sentences:
      - >-
        "Don't dip your pen in company ink." - HR training seminar explaining
        why I shouldn't sleep with the receptionist...I think.
      - >-
        we should be using all the time technology frees up to expand language,
        not shorten it. instead of 'prolly' try 'probababably.'
      - If you like internet jokes, you should see my online bank account.
  - source_sentence: yoga What the to when she him Nahimastay
    sentences:
      - |-
        CRESENCIO: Why do turkeys eat so little?
        MAX: I don’t know.
        CRESENCIO: Because they are always stuffed.
      - I'm really sick of making my dog a birthday cake every 52 days.
      - >-
        Redneck yoga. What did the redneck say to the yoga instructor when she
        asked him to leave the class? Nahimastay
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-roberta-large-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-roberta-large-v1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SeppeV/roberta_TSDAE")
# Run inference
sentences = [
    'yoga What the to when she him Nahimastay',
    'Redneck yoga. What did the redneck say to the yoga instructor when she asked him to leave the class? Nahimastay',
    "I'm really sick of making my dog a birthday cake every 52 days.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 8,522 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 13.95 tokens
    • max: 83 tokens
    • min: 9 tokens
    • mean: 33.15 tokens
    • max: 231 tokens
  • Samples:
    sentence_0 sentence_1
    .... recently changed sound of my clock to Justin Bieber Baby" I wake up 5 earlier do to to it. Justin Bieber.... I have recently changed the sound of my alarm clock to "Justin Bieber - Baby". Now I wake up 5 minutes earlier every day, so I don't have to listen to it.
    A got yesterday . joke be funny it had a tit A woman got breast implants made of wood yesterday.
    This joke would be funny if it had a punchline

    Wooden tit
    TIL unvaccinated children are less likely autistic Because they more TIL unvaccinated children are less likely to be autistic
    Because they are more likely to be dead
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.4690 500 7.4675
0.9381 1000 6.8434

Framework Versions

  • Python: 3.10.16
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}