metadata
base_model: Alibaba-NLP/gte-large-en-v1.5
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:281362
- loss:CachedMultipleNegativesRankingLoss
widget:
- source_sentence: >-
pendaflex surehook extra capacity reinforced hanging folders letter size
assorted colors total of 20 folders per box 6152x2 asstus files rasps hand
tools depotus pendaflex surehook extra capacity reinforced hanging folders
feature longer plastic hooks with builtin tension springs that help keep
files on track the pressboardreinforced insert allows them to expand 2 for
up to 400 sheets durable 11 pt covers wont buckle under pressure includes
an 85 x 11 printerready label sheet and clear tabs for placement in 5
positions paper made from 10 recycled fiber with 10 postconsumer fiber
letter size assorted colors blue yellow red bright green orange 20 per
boxus pendaflexus 6152x2asst officeproducts
sentences:
- >-
pendaflex surehook extra capacity reinforced hanging folders letter size
assorted colors total of 20 folders per box 6152x2 asstus files rasps
hand tools depotus pendaflex surehook extra capacity reinforced hanging
folders feature longer plastic hooks with builtin tension springs that
help keep files on track the pressboardreinforced insert allows them to
expand 2 for up to 400 sheets durable 11 pt covers wont buckle under
pressure includes an 85 x 11 printerready label sheet and clear tabs for
placement in 5 positions paper made from 10 recycled fiber with 10
postconsumer fiber letter size assorted colors blue yellow red bright
green orange 20 per boxus pendaflexus 6152x2asst officeproducts
- >-
nissan cwtwb1u789 factory oem key fob keyless entry remote alarm replace
null nissan 731015045730 798426001120 automotive
- >-
mf digital scribe cddvdbd print station picojet2 300disc capacity
station picojet2 capacity cd solutions mf digital scribe cddvdbd print
station picojet2 300disc capacity mf digital 739410810199 739410811042
739410531797 computersandaccessories
- source_sentence: >-
32020q50k 762148137726 20w 29w bulbscom compact fluorescent cfl plugin
light bulbs with a gx32d2 base tcp 20w 2 pin bright white quad double twin
tube cfl bulb pack of 5 tcp brand 32020q50k toolsandhomeimprovement
sentences:
- >-
refrigerant hoseac compressor ac condenserf 8e3285 001 gb 6cylinder bke
asn diesel eng amm ake avf 2002 audi a4avant canada market fuel bdv avj
aym bdg bfcgb exhaust bcz bfb cooling ac condenser fluid container with
connecting parts 4cylinder alz bbj amb alt bex awx avb model data
akeaymbczbdgbaubdhbfc period 0103 1204 gb 8e0260701bc automotive
- >-
32020q50k 762148137726 double twintube 2pin cfl light bulbs with a
gx32d2 base bulbscom tcp 20w 2 pin bright white quad double twin tube
cfl bulb pack of 5 tcp brand 32020q50k toolsandhomeimprovement
- >-
nylon lock nut 6 x 1 mm 10 hex 1991 bmw 325i base convertible
miscellaneous hardware page 1 note zinc plated steel auveco 11053m769
automotive
- source_sentence: >-
10 12 mm stainless steel jubilee clip hose clampus daf 66 carburetor parts
and service kits us stainless steel jubilee clip hose clamp for 6 mm
flexible fuel line this hose clamp or jubilee clip is perfectly suited for
our 6 mm internal diameter fuel line as well as our green italian style
transparent fuel line neatly finished the stainless steel grade 304 makes
it the perfect compliment to our textile covered fuel hose it is rated for
between 10 and 12 mm internal diameter ie 10 12 mm external hoseline
diameter can be tightened using a 7 mm socket or flat head
screwdriverexcellently suited for use with our gates flexi driver toolus
jubilee1012 toolsandhomeimprovement
sentences:
- >-
stance orange harley davidson the shield socks m556d16tshlg black brand
stomper boots bb9038 the stomper lace up work boot is the perfect allday
strong enough for kickstarting yet still comfortable walking thanks to
d30 insole mens leather motorcycle reinforced toe heel and ankles insole
vulcanized rubber sole hipora waterproof barrier included items 2 black
boots made with upper coated lining inner sole textile outer other man
made material stance 3437434l automotive age adult color black color
primary black gender mens type boot cruisertouring us size 8 12 units
pair weight 000 lbs age adult color black color primary black gender
mens type boot cruisertouring us size 8 12 units pair weight 000 lbs
- >-
datalogic accessories for readers codbc4030bkbt datalogic cod104538
datalogic bc4030bt basecharger rs232usbkbwwand emulation multiinterface
black datalogic bc4030bkbt computersandaccessories
- >-
10 12 mm stainless steel jubilee clip hose clampus daf 66 carburetor
parts and service kits us stainless steel jubilee clip hose clamp for 6
mm flexible fuel line this hose clamp or jubilee clip is perfectly
suited for our 6 mm internal diameter fuel line as well as our green
italian style transparent fuel line neatly finished the stainless steel
grade 304 makes it the perfect compliment to our textile covered fuel
hose it is rated for between 10 and 12 mm internal diameter ie 10 12 mm
external hoseline diameter can be tightened using a 7 mm socket or flat
head screwdriverexcellently suited for use with our gates flexi driver
toolus jubilee1012 toolsandhomeimprovement
- source_sentence: >-
nokya heavy duty headlight harnesses 9496 mercedes s600 w140 2 door
h4hb29003 nokya brand or otherwise each set consists of 2 harnesses as
complete upgrades a precautionary measure against harness plug burnouts
which can permanently damage your mercedes s600 headlight housings these
heavy duty headlight highlow beam h4 wire harnesses also help to handle
the increased demands aftermarket bulbs offers these harnesses cheap and
relatively easy upgrade stock electrical system they work replacements for
damaged plugs lighting are not designed extended periods use operation in
adverse severe conditions to address this these have been be plugged into
s600s wiring aftermarket nok91112pcs automotive
sentences:
- >-
nokya heavy duty headlight harnesses 9496 mercedes s600 w140 2 door
h4hb29003 nokya brand or otherwise each set consists of 2 harnesses as
complete upgrades a precautionary measure against harness plug burnouts
which can permanently damage your mercedes s600 headlight housings these
heavy duty headlight highlow beam h4 wire harnesses also help to handle
the increased demands aftermarket bulbs offers these harnesses cheap and
relatively easy upgrade stock electrical system they work replacements
for damaged plugs lighting are not designed extended periods use
operation in adverse severe conditions to address this these have been
be plugged into s600s wiring aftermarket nok91112pcs automotive
- >-
tektronix black toner cartridge for phaser 6100 3000pages product data
tektronix 3000pages laser toner cartridges 106r00679 long product name
tektronix black toner cartridge for phaser 6100 3000pages the short
editorial description of tektronix black toner cartridge for phaser 6100
3000pages black toner cartridge for phaser 6100 more short summary
description tektronix black toner cartridge for phaser 6100 3000pages
this short summary of the tektronix black toner cartridge for phaser
6100 3000pages datasheet is autogenerated and uses the product title and
the first six key specs tektronix black toner cartridge for phaser 6100
3000 pages long summary description tektronix black toner cartridge for
phaser 6100 3000pages tektronix black toner cartridge for phaser 6100
page yield 3000 pages this is an autogenerated long summary of tektronix
black toner cartridge for phaser 6100 3000pages based on the first three
specs of the first five spec groups tektronix 106r00679 officeproducts
- >-
hose clamp 53558 mm range 12 width spring type 1990 bmw 325i base coupe
cooling system miscellaneous page 1 mubea sc5355812m219 automotive
- source_sentence: >-
b260iunvhp000i 768386242246 20 30 where number of lamps is 1 linear
fluorescent ballasts and by wattage x 75w 99w bulbscom universal
electronic ballast 120v to 277v for 2 f96t12 universal brand
b260iunvhp000i toolsandhomeimprovement
sentences:
- >-
b260iunvhp000i 768386242246 10 50 where length is 10 under 18 bulbscom
electronic t12 linear fluorescent ballasts universal electronic ballast
120v to 277v for 2 f96t12 universal brand b260iunvhp000i
toolsandhomeimprovement
- >-
danze 24 double towel bar danze products at efaucetscom towel bars
bathroom accessories danze 24 double towel bar parma collection solid
brass construction easy to install mounting hardware included matching
faucet collection d446612bn toolsandhomeimprovement
- >-
greeting cards gestures of enlightenment set of 12 12 meaningful gift
ideas browse tharpa a series of 12 greeting cards with original artwork
conveying the beauty and power of the buddhas gestures mark that special
occasion with a card that is exquisite original and auspicious send good
wishes for long life wisdom and good fortune with amitayus or white
taras gesture for example or comfort someone who is suffering by sending
medicine buddhas card see below for a full description of each card and
click on the thumbnail images to view each design make your good wishes
auspicious and meaningful blank inside for your own message dimensions 7
x 5 178cm x 136cm fits a standard photo frame envelope provided sold
singly or as a set of 12 save 7 order the full set below or click here
to purchase cards individually 5055278111182 officeproducts
SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Alibaba-NLP/gte-large-en-v1.5
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'b260iunvhp000i 768386242246 20 30 where number of lamps is 1 linear fluorescent ballasts and by wattage x 75w 99w bulbscom universal electronic ballast 120v to 277v for 2 f96t12 universal brand b260iunvhp000i toolsandhomeimprovement',
'b260iunvhp000i 768386242246 10 50 where length is 10 under 18 bulbscom electronic t12 linear fluorescent ballasts universal electronic ballast 120v to 277v for 2 f96t12 universal brand b260iunvhp000i toolsandhomeimprovement',
'danze 24 double towel bar danze products at efaucetscom towel bars bathroom accessories danze 24 double towel bar parma collection solid brass construction easy to install mounting hardware included matching faucet collection d446612bn toolsandhomeimprovement',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 281,362 training samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 20 tokens
- mean: 77.79 tokens
- max: 1149 tokens
- min: 19 tokens
- mean: 82.69 tokens
- max: 1149 tokens
- Samples:
anchor positive clever lever extra giga punch scallop circle 35 inches clever wholesale darice this clever lever extra giga punch produces a clearcut scallop circle the craft punch is ideal for embellishing scrapbooks greeting cards invitations programs and many more paper crafts the scalloped circle is 35 inches in size 1 craft punch per package lvxgcp65 officeproductsclever lever extra giga punch scallop circle 35 inches clever wholesale darice this clever lever extra giga punch produces a clearcut scallop circle the craft punch is ideal for embellishing scrapbooks greeting cards invitations programs and many more paper crafts the scalloped circle is 35 inches in size 1 craft punch per package lvxgcp65 officeproductsstrut front right shocks springs page 1 2002 bmw 325i base sedan suspension genuine bmw 31312282460boe automotivestrut front right shocks springs page 1 2002 bmw 325i base sedan suspension note only for cars with sport suspension and m sport package sachs 31312282460m10 automotiveherrold 40 drawer chest in dark walnutmango wood 792977257388 arreton 46quote in washed white oakantique brass sale home lighting fixtures lamps more online symbolizing achievement and rank the shield shape of this six drawer chest bears both historical and design significance built with craftsmens detail from dark walnutstained mango wood and mahogany veneers chest features curved sides smooth uttermost 25738upc792977257388 toolsandhomeimprovementherrold 40 drawer chest in dark walnutmango wood 792977257388 malthus 31quote in aged parchmentreclaimed mahogany sale home lighting fixtures lamps more online symbolizing achievement and rank the shield shape of this six drawer chest bears both historical and design significance built with craftsmens detail from dark walnutstained mango wood and mahogany veneers chest features curved sides smooth uttermost 25738upc792977257388 toolsandhomeimprovement - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 70,341 evaluation samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 18 tokens
- mean: 82.79 tokens
- max: 1154 tokens
- min: 18 tokens
- mean: 82.61 tokens
- max: 1154 tokens
- Samples:
anchor positive retro 70s furniture set armchairs chairs and vector image furniture images over 41 000 retro 70s furniture set armchairs chairs and sofas vector illustration eps 8 vector image 14149273 officeproductsretro 70s furniture set armchairs chairs and vector image setting images over 12 million retro 70s furniture set armchairs chairs and sofas vector illustration eps 8 vector image 14149273 officeproductshp designjet 70 cartridges for ink jet printers quillcom ink volume 130 mlthis cartridge is not compatible with hp designjet t620 24in photo printer hp photosmart pro b9180 printer hp photosmart pro b8850 photo printer hp photosmart pro b8800 photo printerfaderesistant color provides superior results and brilliant truetolife images that last for generations 901680441 officeproductshp designjet z2100 44 in cartridges for ink jet printers quillcom ink volume 130 mlthis cartridge is not compatible with hp designjet t620 24in photo printer hp photosmart pro b9180 printer hp photosmart pro b8850 photo printer hp photosmart pro b8800 photo printerfaderesistant color provides superior results and brilliant truetolife images that last for generations 901680441 officeproductssuspension strut assembly shocks springs page 1 1996 bmw 318i base convertible suspension note front left w sport suspension front left bilstein touring class 22172518int automotivesuspension strut assembly shocks springs page 1 1997 bmw 318is base coupe suspension note front left w sport suspension front left bilstein touring class 22172518int automotive - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepslearning_rate: 1e-05num_train_epochs: 2warmup_ratio: 0.1fp16: Trueauto_find_batch_size: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Truefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | loss |
|---|---|---|---|
| 0.1990 | 7000 | 0.0072 | 0.0020 |
| 0.3981 | 14000 | 0.0019 | 0.0020 |
| 0.5971 | 21000 | 0.0034 | 0.0016 |
| 0.7961 | 28000 | 0.001 | 0.0013 |
| 0.9951 | 35000 | 0.0012 | 0.0010 |
| 1.1942 | 42000 | 0.0009 | 0.0008 |
| 1.3932 | 49000 | 0.0005 | 0.0009 |
| 1.5922 | 56000 | 0.0004 | 0.0007 |
| 1.7912 | 63000 | 0.0003 | 0.0007 |
Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.0.1
- Transformers: 4.44.0
- PyTorch: 2.2.1
- Accelerate: 0.33.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}