gte_ISB / README.md
spl4shedEdu's picture
Upload model checkpoint
7390532 verified
metadata
base_model: Alibaba-NLP/gte-large-en-v1.5
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:281362
  - loss:CachedMultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      pendaflex surehook extra capacity reinforced hanging folders letter size
      assorted colors total of 20 folders per box 6152x2 asstus files rasps hand
      tools depotus pendaflex surehook extra capacity reinforced hanging folders
      feature longer plastic hooks with builtin tension springs that help keep
      files on track the pressboardreinforced insert allows them to expand 2 for
      up to 400 sheets durable 11 pt covers wont buckle under pressure includes
      an 85 x 11 printerready label sheet and clear tabs for placement in 5
      positions paper made from 10 recycled fiber with 10 postconsumer fiber
      letter size assorted colors blue yellow red bright green orange 20 per
      boxus pendaflexus 6152x2asst officeproducts
    sentences:
      - >-
        pendaflex surehook extra capacity reinforced hanging folders letter size
        assorted colors total of 20 folders per box 6152x2 asstus files rasps
        hand tools depotus pendaflex surehook extra capacity reinforced hanging
        folders feature longer plastic hooks with builtin tension springs that
        help keep files on track the pressboardreinforced insert allows them to
        expand 2 for up to 400 sheets durable 11 pt covers wont buckle under
        pressure includes an 85 x 11 printerready label sheet and clear tabs for
        placement in 5 positions paper made from 10 recycled fiber with 10
        postconsumer fiber letter size assorted colors blue yellow red bright
        green orange 20 per boxus pendaflexus 6152x2asst officeproducts
      - >-
        nissan cwtwb1u789 factory oem key fob keyless entry remote alarm replace
        null nissan 731015045730 798426001120 automotive
      - >-
        mf digital scribe cddvdbd print station picojet2 300disc capacity
        station picojet2 capacity cd solutions mf digital scribe cddvdbd print
        station picojet2 300disc capacity mf digital 739410810199 739410811042
        739410531797 computersandaccessories
  - source_sentence: >-
      32020q50k 762148137726 20w 29w bulbscom compact fluorescent cfl plugin
      light bulbs with a gx32d2 base tcp 20w 2 pin bright white quad double twin
      tube cfl bulb pack of 5 tcp brand 32020q50k toolsandhomeimprovement
    sentences:
      - >-
        refrigerant hoseac compressor ac condenserf 8e3285 001 gb 6cylinder bke
        asn diesel eng amm ake avf 2002 audi a4avant canada market fuel bdv avj
        aym bdg bfcgb exhaust bcz bfb cooling ac condenser fluid container with
        connecting parts 4cylinder alz bbj amb alt bex awx avb model data
        akeaymbczbdgbaubdhbfc period 0103 1204 gb 8e0260701bc automotive
      - >-
        32020q50k 762148137726 double twintube 2pin cfl light bulbs with a
        gx32d2 base bulbscom tcp 20w 2 pin bright white quad double twin tube
        cfl bulb pack of 5 tcp brand 32020q50k toolsandhomeimprovement
      - >-
        nylon lock nut 6 x 1 mm 10 hex 1991 bmw 325i base convertible
        miscellaneous hardware page 1 note zinc plated steel auveco 11053m769
        automotive
  - source_sentence: >-
      10 12 mm stainless steel jubilee clip hose clampus daf 66 carburetor parts
      and service kits us stainless steel jubilee clip hose clamp for 6 mm
      flexible fuel line this hose clamp or jubilee clip is perfectly suited for
      our 6 mm internal diameter fuel line as well as our green italian style
      transparent fuel line neatly finished the stainless steel grade 304 makes
      it the perfect compliment to our textile covered fuel hose it is rated for
      between 10 and 12 mm internal diameter ie 10 12 mm external hoseline
      diameter can be tightened using a 7 mm socket or flat head
      screwdriverexcellently suited for use with our gates flexi driver toolus
      jubilee1012 toolsandhomeimprovement
    sentences:
      - >-
        stance orange harley davidson the shield socks m556d16tshlg black brand
        stomper boots bb9038 the stomper lace up work boot is the perfect allday
        strong enough for kickstarting yet still comfortable walking thanks to
        d30 insole mens leather motorcycle reinforced toe heel and ankles insole
        vulcanized rubber sole hipora waterproof barrier included items 2 black
        boots made with upper coated lining inner sole textile outer other man
        made material stance 3437434l automotive age adult color black color
        primary black gender mens type boot cruisertouring us size 8 12 units
        pair weight 000 lbs age adult color black color primary black gender
        mens type boot cruisertouring us size 8 12 units pair weight 000 lbs
      - >-
        datalogic accessories for readers codbc4030bkbt datalogic cod104538
        datalogic bc4030bt basecharger rs232usbkbwwand emulation multiinterface
        black datalogic bc4030bkbt computersandaccessories
      - >-
        10 12 mm stainless steel jubilee clip hose clampus daf 66 carburetor
        parts and service kits us stainless steel jubilee clip hose clamp for 6
        mm flexible fuel line this hose clamp or jubilee clip is perfectly
        suited for our 6 mm internal diameter fuel line as well as our green
        italian style transparent fuel line neatly finished the stainless steel
        grade 304 makes it the perfect compliment to our textile covered fuel
        hose it is rated for between 10 and 12 mm internal diameter ie 10 12 mm
        external hoseline diameter can be tightened using a 7 mm socket or flat
        head screwdriverexcellently suited for use with our gates flexi driver
        toolus jubilee1012 toolsandhomeimprovement
  - source_sentence: >-
      nokya heavy duty headlight harnesses 9496 mercedes s600 w140 2 door
      h4hb29003 nokya brand or otherwise each set consists of 2 harnesses as
      complete upgrades a precautionary measure against harness plug burnouts
      which can permanently damage your mercedes s600 headlight housings these
      heavy duty headlight highlow beam h4 wire harnesses also help to handle
      the increased demands aftermarket bulbs offers these harnesses cheap and
      relatively easy upgrade stock electrical system they work replacements for
      damaged plugs lighting are not designed extended periods use operation in
      adverse severe conditions to address this these have been be plugged into
      s600s wiring aftermarket nok91112pcs automotive
    sentences:
      - >-
        nokya heavy duty headlight harnesses 9496 mercedes s600 w140 2 door
        h4hb29003 nokya brand or otherwise each set consists of 2 harnesses as
        complete upgrades a precautionary measure against harness plug burnouts
        which can permanently damage your mercedes s600 headlight housings these
        heavy duty headlight highlow beam h4 wire harnesses also help to handle
        the increased demands aftermarket bulbs offers these harnesses cheap and
        relatively easy upgrade stock electrical system they work replacements
        for damaged plugs lighting are not designed extended periods use
        operation in adverse severe conditions to address this these have been
        be plugged into s600s wiring aftermarket nok91112pcs automotive
      - >-
        tektronix black toner cartridge for phaser 6100 3000pages product data
        tektronix 3000pages laser toner cartridges 106r00679 long product name
        tektronix black toner cartridge for phaser 6100 3000pages the short
        editorial description of tektronix black toner cartridge for phaser 6100
        3000pages black toner cartridge for phaser 6100 more short summary
        description tektronix black toner cartridge for phaser 6100 3000pages
        this short summary of the tektronix black toner cartridge for phaser
        6100 3000pages datasheet is autogenerated and uses the product title and
        the first six key specs tektronix black toner cartridge for phaser 6100
        3000 pages long summary description tektronix black toner cartridge for
        phaser 6100 3000pages tektronix black toner cartridge for phaser 6100
        page yield 3000 pages this is an autogenerated long summary of tektronix
        black toner cartridge for phaser 6100 3000pages based on the first three
        specs of the first five spec groups tektronix 106r00679 officeproducts
      - >-
        hose clamp 53558 mm range 12 width spring type 1990 bmw 325i base coupe
        cooling system miscellaneous page 1 mubea sc5355812m219 automotive
  - source_sentence: >-
      b260iunvhp000i 768386242246 20 30 where number of lamps is 1 linear
      fluorescent ballasts and by wattage x 75w 99w bulbscom universal
      electronic ballast 120v to 277v for 2 f96t12 universal brand
      b260iunvhp000i toolsandhomeimprovement
    sentences:
      - >-
        b260iunvhp000i 768386242246 10 50 where length is 10 under 18 bulbscom
        electronic t12 linear fluorescent ballasts universal electronic ballast
        120v to 277v for 2 f96t12 universal brand b260iunvhp000i
        toolsandhomeimprovement
      - >-
        danze 24 double towel bar danze products at efaucetscom towel bars
        bathroom accessories danze 24 double towel bar parma collection solid
        brass construction easy to install mounting hardware included matching
        faucet collection d446612bn toolsandhomeimprovement
      - >-
        greeting cards gestures of enlightenment set of 12 12 meaningful gift
        ideas browse tharpa a series of 12 greeting cards with original artwork
        conveying the beauty and power of the buddhas gestures mark that special
        occasion with a card that is exquisite original and auspicious send good
        wishes for long life wisdom and good fortune with amitayus or white
        taras gesture for example or comfort someone who is suffering by sending
        medicine buddhas card see below for a full description of each card and
        click on the thumbnail images to view each design make your good wishes
        auspicious and meaningful blank inside for your own message dimensions 7
        x 5 178cm x 136cm fits a standard photo frame envelope provided sold
        singly or as a set of 12 save 7 order the full set below or click here
        to purchase cards individually 5055278111182 officeproducts

SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-large-en-v1.5
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'b260iunvhp000i 768386242246 20 30 where number of lamps is 1 linear fluorescent ballasts and by wattage x 75w 99w bulbscom universal electronic ballast 120v to 277v for 2 f96t12 universal brand b260iunvhp000i toolsandhomeimprovement',
    'b260iunvhp000i 768386242246 10 50 where length is 10 under 18 bulbscom electronic t12 linear fluorescent ballasts universal electronic ballast 120v to 277v for 2 f96t12 universal brand b260iunvhp000i toolsandhomeimprovement',
    'danze 24 double towel bar danze products at efaucetscom towel bars bathroom accessories danze 24 double towel bar parma collection solid brass construction easy to install mounting hardware included matching faucet collection d446612bn toolsandhomeimprovement',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 281,362 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 20 tokens
    • mean: 77.79 tokens
    • max: 1149 tokens
    • min: 19 tokens
    • mean: 82.69 tokens
    • max: 1149 tokens
  • Samples:
    anchor positive
    clever lever extra giga punch scallop circle 35 inches clever wholesale darice this clever lever extra giga punch produces a clearcut scallop circle the craft punch is ideal for embellishing scrapbooks greeting cards invitations programs and many more paper crafts the scalloped circle is 35 inches in size 1 craft punch per package lvxgcp65 officeproducts clever lever extra giga punch scallop circle 35 inches clever wholesale darice this clever lever extra giga punch produces a clearcut scallop circle the craft punch is ideal for embellishing scrapbooks greeting cards invitations programs and many more paper crafts the scalloped circle is 35 inches in size 1 craft punch per package lvxgcp65 officeproducts
    strut front right shocks springs page 1 2002 bmw 325i base sedan suspension genuine bmw 31312282460boe automotive strut front right shocks springs page 1 2002 bmw 325i base sedan suspension note only for cars with sport suspension and m sport package sachs 31312282460m10 automotive
    herrold 40 drawer chest in dark walnutmango wood 792977257388 arreton 46quote in washed white oakantique brass sale home lighting fixtures lamps more online symbolizing achievement and rank the shield shape of this six drawer chest bears both historical and design significance built with craftsmens detail from dark walnutstained mango wood and mahogany veneers chest features curved sides smooth uttermost 25738upc792977257388 toolsandhomeimprovement herrold 40 drawer chest in dark walnutmango wood 792977257388 malthus 31quote in aged parchmentreclaimed mahogany sale home lighting fixtures lamps more online symbolizing achievement and rank the shield shape of this six drawer chest bears both historical and design significance built with craftsmens detail from dark walnutstained mango wood and mahogany veneers chest features curved sides smooth uttermost 25738upc792977257388 toolsandhomeimprovement
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 70,341 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 18 tokens
    • mean: 82.79 tokens
    • max: 1154 tokens
    • min: 18 tokens
    • mean: 82.61 tokens
    • max: 1154 tokens
  • Samples:
    anchor positive
    retro 70s furniture set armchairs chairs and vector image furniture images over 41 000 retro 70s furniture set armchairs chairs and sofas vector illustration eps 8 vector image 14149273 officeproducts retro 70s furniture set armchairs chairs and vector image setting images over 12 million retro 70s furniture set armchairs chairs and sofas vector illustration eps 8 vector image 14149273 officeproducts
    hp designjet 70 cartridges for ink jet printers quillcom ink volume 130 mlthis cartridge is not compatible with hp designjet t620 24in photo printer hp photosmart pro b9180 printer hp photosmart pro b8850 photo printer hp photosmart pro b8800 photo printerfaderesistant color provides superior results and brilliant truetolife images that last for generations 901680441 officeproducts hp designjet z2100 44 in cartridges for ink jet printers quillcom ink volume 130 mlthis cartridge is not compatible with hp designjet t620 24in photo printer hp photosmart pro b9180 printer hp photosmart pro b8850 photo printer hp photosmart pro b8800 photo printerfaderesistant color provides superior results and brilliant truetolife images that last for generations 901680441 officeproducts
    suspension strut assembly shocks springs page 1 1996 bmw 318i base convertible suspension note front left w sport suspension front left bilstein touring class 22172518int automotive suspension strut assembly shocks springs page 1 1997 bmw 318is base coupe suspension note front left w sport suspension front left bilstein touring class 22172518int automotive
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 1e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • auto_find_batch_size: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: True
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.1990 7000 0.0072 0.0020
0.3981 14000 0.0019 0.0020
0.5971 21000 0.0034 0.0016
0.7961 28000 0.001 0.0013
0.9951 35000 0.0012 0.0010
1.1942 42000 0.0009 0.0008
1.3932 49000 0.0005 0.0009
1.5922 56000 0.0004 0.0007
1.7912 63000 0.0003 0.0007

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.2.1
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, 
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}