SentenceTransformer based on zacbrld/MNLP_M2_document_encoder
This is a sentence-transformers model finetuned from zacbrld/MNLP_M2_document_encoder. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: zacbrld/MNLP_M2_document_encoder
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M2_document_encoder")
# Run inference
sentences = [
'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine',
'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine',
'Cerebellar Purkinje neurons have been proposed to have two distinct bursting modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, wherein the persistent Na+ current is the burst initiator and the SK K+ current is the burst terminator. Purkinje neurons may utilise these bursting forms in information coding to the deep cerebellar nuclei.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 5,489 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 34 tokens
- mean: 144.23 tokens
- max: 256 tokens
- min: 34 tokens
- mean: 144.23 tokens
- max: 256 tokens
- Samples:
sentence_0 sentence_1 In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree.
The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 5multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size
: 0fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
1.4535 | 500 | 0.0002 |
2.9070 | 1000 | 0.0 |
4.3605 | 1500 | 0.0007 |
Framework Versions
- Python: 3.10.11
- Sentence Transformers: 3.4.1
- Transformers: 4.51.3
- PyTorch: 2.6.0
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- -
Model tree for zacbrld/MNLP_M2_document_encoder
Unable to build the model tree, the base model loops to the model itself. Learn more.