SentenceTransformer based on allenai/specter2_base
This is a sentence-transformers model finetuned from allenai/specter2_base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: allenai/specter2_base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jagadeesh/zeiss-re-1757437055")
# Run inference
sentences = [
'We previously demonstrated that neural stem/progenitor cells (NSPCs) were induced within and around the ischemic areas in a mouse model of ischemic stroke. These injury/ischemia-induced NSPCs (iNSPCs) differentiated to electrophysiologically functional neurons in vitro, indicating the presence of a self-repair system following injury. However, during the healing process after stroke, ischemic areas were gradually occupied by inflammatory cells, mainly microglial cells/macrophages (MGs/MΦs), and neurogenesis rarely occurred within and around the ischemic areas. Therefore, to achieve neural regeneration by utilizing endogenous iNSPCs, regulation of MGs/MΦs after an ischemic stroke might be necessary. To test this hypothesis, we used iNSPCs isolated from the ischemic areas after a stroke in our mouse model to investigate the role of MGs/MΦs in iNSPC regulation. In coculture experiments, we show that the presence of MGs/MΦs significantly reduces not only the proliferation but also the differentiation of iNSPCs toward neuronal cells, thereby preventing neurogenesis. These effects, however, are mitigated by MG/MΦ depletion using clodronate encapsulated in liposomes. Additionally, gene ontology analysis reveals that proliferation and neuronal differentiation are negatively regulated in iNSPCs cocultured with MGs/MΦs. These results indicate that MGs/MΦs negatively impact neurogenesis via iNSPCs, suggesting that the regulation of MGs/MΦs is essential to achieve iNSPC-based neural regeneration following an ischemic stroke.',
"ZEISS Airyscan is an advanced imaging technology that enhances traditional confocal microscopy by using a 32-channel detector to capture more light with higher resolution and sensitivity. Unlike standard confocal systems that rely on a single pinhole, Airyscan collects the entire Airy disk pattern and reconstructs images for super-resolution clarityâ down to 120 nm laterally. This results in significantly improved signal-to-noise ratio and reduced photodamage, making it ideal for detailed imaging of live cells and biological samples. It's compatible with ZEISS LSM systems like the LSM 880 and 900, offering researchers a powerful tool for high-precision fluorescence microscopy",
"ZEISS Airyscan is an advanced imaging technology that enhances traditional confocal microscopy by using a 32-channel detector to capture more light with higher resolution and sensitivity. Unlike standard confocal systems that rely on a single pinhole, Airyscan collects the entire Airy disk pattern and reconstructs images for super-resolution clarityâ down to 120 nm laterally. This results in significantly improved signal-to-noise ratio and reduced photodamage, making it ideal for detailed imaging of live cells and biological samples. It's compatible with ZEISS LSM systems like the LSM 880 and 900, offering researchers a powerful tool for high-precision fluorescence microscopy",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6165, 0.6165],
# [0.6165, 1.0000, 1.0000],
# [0.6165, 1.0000, 1.0000]])
Evaluation
Metrics
Information Retrieval
- Dataset:
ir-eval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0 |
cosine_accuracy@3 | 0.0 |
cosine_accuracy@5 | 0.0 |
cosine_accuracy@10 | 0.0 |
cosine_precision@1 | 0.0 |
cosine_precision@3 | 0.0 |
cosine_precision@5 | 0.0 |
cosine_precision@10 | 0.0 |
cosine_recall@1 | 0.0 |
cosine_recall@3 | 0.0 |
cosine_recall@5 | 0.0 |
cosine_recall@10 | 0.0 |
cosine_ndcg@10 | 0.0 |
cosine_mrr@10 | 0.0 |
cosine_map@100 | 0.0 |
Information Retrieval
- Dataset:
ir-eval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.1421 |
cosine_accuracy@3 | 0.3124 |
cosine_accuracy@5 | 0.4525 |
cosine_accuracy@10 | 0.6699 |
cosine_precision@1 | 0.1421 |
cosine_precision@3 | 0.1041 |
cosine_precision@5 | 0.0905 |
cosine_precision@10 | 0.067 |
cosine_recall@1 | 0.1421 |
cosine_recall@3 | 0.3124 |
cosine_recall@5 | 0.4525 |
cosine_recall@10 | 0.6699 |
cosine_ndcg@10 | 0.3647 |
cosine_mrr@10 | 0.2722 |
cosine_map@100 | 0.2917 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 17,793 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 2 tokens
- mean: 283.9 tokens
- max: 512 tokens
- min: 91 tokens
- mean: 355.19 tokens
- max: 512 tokens
- Samples:
anchor positive Nutrition and resilience are linked, though it is not yet clear how diet confers stress resistance or the breadth of stressors that it can protect against. We have previously shown that transiently restricting an essential amino acid can protect Drosophila melanogaster against nicotine poisoning. Here, we sought to characterize the nature of this dietary-mediated protection and determine whether it was sex, amino acid and/or nicotine specific. When we compared between sexes, we found that isoleucine deprivation increases female, but not male, nicotine resistance. Surprisingly, we found that this protection afforded to females was not replicated by dietary protein restriction and was instead specific to individual amino acid restriction. To understand whether these beneficial effects of diet were specific to nicotine or were generalizable across stressors, we pre-treated flies with amino acid restriction diets and exposed them to other types of stress. We found that some of the diets th...
the zeiss stemi 508 is an apochromatic stereo microscope with an 8:1 zoom range, designed for high-contrast, color-accurate three-dimensional observation and documentation of diverse samples. its ergonomic design and robust mechanics support demanding applications in laboratory and industrial settings. key research and application areas: - biological research: suitable for observing the development and growth of model organisms like spider crabs, chicken, mouse, or zebrafish, including the evaluation, sorting, selection, or dissection of eggs, larvae, or embryos. it is also used in botany to observe changes in plant organs, diseases, and root development, in entomology for insect observation, documentation, and identification, in marine biology to study the life and reproduction of fish, and in parasitology for detecting and identifying the spread of parasites. the microscope is valuable for forensic analysis of ammunition parts, tool marks, documents, fibers, coatings, glass, textiles...
The controlled supply of bioactive molecules is a subject of debate in animal nutrition. The release of bioactive molecules in the target organ, in this case the intestine, results in improved feed, as well as having a lower environmental impact. However, the degradation of bioactive molecules' in transit in the gastrointestinal passage is still an unresolved issue. This paper discusses the feasibility of a simple and cost-effective procedure to bypass the degradation problem. A solid/liquid adsorption procedure was applied, and the operating parameters (pH, reaction time, and LY initial concentration) were studied. Lysozyme is used in this work as a representative bioactive molecule, while Adsorbo ® , a commercial mixture of clay minerals and zeolites which meets current feed regulations, is used as the carrier. A maximum LY loading of 32 mg LY /g AD (LY(32)-AD) was obtained, with fixing pH in the range 7.5-8, initial LY content at 37.5 mg LY /g AD , and reaction time at 30 min. A ful...
the zeiss evo family of scanning electron microscopes offers a modular and versatile platform for a wide range of scientific and industrial investigations, combining high-performance imaging and analysis with intuitive operation for users of varying experience levels. key research and application areas: - materials science: characterizing the morphology, structure, and composition of diverse materials, including metals, composites, polymers, ceramics, and coatings, for research and development. this includes investigating surface structures, fractures, inclusions, and grain boundaries. the evo supports advanced material analysis through techniques like energy dispersive spectroscopy (eds) and electron backscatter diffraction (ebsd). - life sciences: enabling the examination of biological specimens in their native or near-native hydrated states using variable and extended pressure modes. applications include imaging cells, tissues, plants, and microorganisms for structural and morpholog...
Amorphous potassium sodium niobate (KNN) films were synthesized at 300 °C through the radio frequency magnetron sputtering method and subsequently crystallized by post-annealing at 700 °C in various alkali element atmospheres (Na and K). The as-deposited film is notably deficient in alkali metal elements, particularly K, whereas the loss of alkali elements in the films can be replenished through annealing in an alkali element atmosphere. By adjusting the molar ratio of Na and K in the annealing atmosphere, the ratio of Na/K in the resultant film varied, consequently suggesting the efficiency of this method on composition regulation of KNN films. Meanwhile, we also found that the physical characteristics of the films also underwent differences with the change of an annealing atmosphere. The films annealed in a high Na atmosphere exhibit large dielectric losses with limited piezoelectric vibration behavior, while annealing in a high K atmosphere reduces the dielectric losses and enhances...
the zeiss sigma family of field emission scanning electron microscopes (fe-sems) offers versatile solutions for high-quality imaging and advanced analytical microscopy across a multitude of scientific and industrial domains. these instruments are engineered for reliable, high-end nano-analysis, combining fe-sem technology with an intuitive user experience to enhance productivity. key research and application areas: - advancing materials science: facilitating the development and understanding of novel materials by enabling the investigation of micro- and nanoscale structures. this includes characterizing metals, alloys, polymers, catalysts, and coatings for various applications such as electronics and energy. - driving innovation in nanoscience and nanomaterials: providing capabilities for the analysis of nanoparticles, thin films, 2d materials (like graphene and mos2), and other nanostructures to understand their properties and potential applications. - supporting energy research: enab...
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 1e-05num_train_epochs
: 5warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 1e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config
: Nonedeepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | ir-eval_cosine_ndcg@10 |
---|---|---|---|
0.0898 | 100 | 2.488 | 0.0 |
0.1797 | 200 | 2.321 | 0.0 |
0.2695 | 300 | 2.0777 | 0.0 |
0.3594 | 400 | 1.833 | 0.0 |
0.4492 | 500 | 1.7474 | 0.0 |
-1 | -1 | - | 0.2969 |
0.0898 | 100 | 2.2378 | 0.3023 |
0.1797 | 200 | 2.1268 | 0.3196 |
0.2695 | 300 | 1.8964 | 0.3541 |
0.3594 | 400 | 1.6197 | 0.3123 |
0.4492 | 500 | 1.493 | 0.3086 |
0.5391 | 600 | 1.4507 | 0.3146 |
0.6289 | 700 | 1.6187 | 0.2985 |
0.7188 | 800 | 1.4818 | 0.3412 |
0.8086 | 900 | 1.3241 | 0.2945 |
0.8985 | 1000 | 1.3055 | 0.2161 |
0.9883 | 1100 | 1.2704 | 0.2712 |
1.0782 | 1200 | 2.009 | 0.3143 |
1.1680 | 1300 | 2.0103 | 0.3403 |
1.2579 | 1400 | 1.8953 | 0.3408 |
1.3477 | 1500 | 1.662 | 0.3409 |
1.4376 | 1600 | 1.656 | 0.3073 |
1.5274 | 1700 | 1.537 | 0.2792 |
1.6173 | 1800 | 1.4893 | 0.2730 |
1.7071 | 1900 | 1.3447 | 0.2537 |
1.7969 | 2000 | 1.2444 | 0.2496 |
1.8868 | 2100 | 1.1493 | 0.2314 |
1.9766 | 2200 | 1.26 | 0.2753 |
2.0665 | 2300 | 1.7302 | 0.3514 |
2.1563 | 2400 | 1.7719 | 0.3546 |
2.2462 | 2500 | 1.7208 | 0.3366 |
2.3360 | 2600 | 1.4715 | 0.3387 |
2.4259 | 2700 | 1.45 | 0.2974 |
2.5157 | 2800 | 1.3878 | 0.3084 |
2.6056 | 2900 | 1.3184 | 0.2915 |
2.6954 | 3000 | 1.2562 | 0.2917 |
2.7853 | 3100 | 1.119 | 0.2940 |
2.8751 | 3200 | 1.1307 | 0.2989 |
2.9650 | 3300 | 1.1421 | 0.3081 |
3.0548 | 3400 | 1.4917 | 0.3402 |
3.1447 | 3500 | 1.5628 | 0.3392 |
3.2345 | 3600 | 1.4621 | 0.3684 |
3.3243 | 3700 | 1.342 | 0.3601 |
3.4142 | 3800 | 1.3052 | 0.3222 |
3.5040 | 3900 | 1.2133 | 0.3566 |
3.5939 | 4000 | 1.248 | 0.3631 |
3.6837 | 4100 | 1.2261 | 0.3558 |
3.7736 | 4200 | 0.978 | 0.3428 |
3.8634 | 4300 | 0.9916 | 0.3545 |
3.9533 | 4400 | 1.0824 | 0.3492 |
4.0431 | 4500 | 1.2055 | 0.3418 |
4.1330 | 4600 | 1.404 | 0.3481 |
4.2228 | 4700 | 1.3775 | 0.3613 |
4.3127 | 4800 | 1.2128 | 0.3579 |
4.4025 | 4900 | 1.168 | 0.3625 |
4.4924 | 5000 | 1.1061 | 0.3600 |
4.5822 | 5100 | 1.1213 | 0.3658 |
4.6721 | 5200 | 1.0396 | 0.3603 |
4.7619 | 5300 | 0.9766 | 0.3702 |
4.8518 | 5400 | 0.9143 | 0.3618 |
4.9416 | 5500 | 0.9728 | 0.3647 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 5.1.0
- Transformers: 4.56.1
- PyTorch: 2.8.0.dev20250319+cu128
- Accelerate: 1.10.1
- Datasets: 3.6.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 13
Model tree for jagadeesh/zeiss-re-1757437055
Base model
allenai/specter2_baseEvaluation results
- Cosine Accuracy@1 on ir evalself-reported0.000
- Cosine Accuracy@3 on ir evalself-reported0.000
- Cosine Accuracy@5 on ir evalself-reported0.000
- Cosine Accuracy@10 on ir evalself-reported0.000
- Cosine Precision@1 on ir evalself-reported0.000
- Cosine Precision@3 on ir evalself-reported0.000
- Cosine Precision@5 on ir evalself-reported0.000
- Cosine Precision@10 on ir evalself-reported0.000
- Cosine Recall@1 on ir evalself-reported0.000
- Cosine Recall@3 on ir evalself-reported0.000