SentenceTransformer based on Alibaba-NLP/gte-multilingual-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-multilingual-base on the offshore_energy dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Alibaba-NLP/gte-multilingual-base
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- offshore_energy

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'NewModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Sampath1987/EnergyEmbed-v1")
# Run inference
sentences = [
    'What role did anti-collision analysis play in the drilling of the dual lateral well?',
    'This paper aims to analyze the impact of appraising and developing marginal fields with multiple stacked reservoirs which is quite challenging in terms of techno commercial value. The development of such marginal reservoirs using conventional single horizontal wells drilling and completion is uneconomical. Therefore, it was necessary to engineer a solution that can enhance the commercial value of the project by reducing CAPEX and OPEX. This paper will present the first comprehensive business case, where multiple stacked reservoirs with marginal reserves were studied to produce independently using multilateral completions, granting full accessibility of the laterals while achieving production monitoring and reservoir surveillance.',
    "The most common challenge in horizontal drilling is depth uncertainty which can be due to poor seismic data or interpretation. It is arguable that a successful landing of the wellbore in the reservoir optimally and within the desired zone is the most challenging in most geosteering operation. The presence of fluid contacts such as oil-water-contact (OWC) and gas-oil-contact (GOC) complicates the whole drilling process, most especially if these fluid contacts are not well defined or known. Additionally, the ability to map the boundaries of the reservoir as the BHA drills the lateral section is an added advantage to remaining within the desired reservoir section.\nThe success of any reservoir navigation service where seismic uncertainty at the reservoir top is high will rely largely on how effective the geosteering system is and how the geosteering engineer is able to react promptly to changes while landing the well in the reservoir and drilling the lateral section with without exiting the reservoir.\nReservoir Navigation Service (RNS) provides the means for the drilling near horizontal or horizontal wells for the purpose of increasing hydrocarbon extraction from the earth's subsurface. This involves the use of a pre-defined bottom hole assembly (BHA) with inbuilt downhole logging while drilling (LWD) and measurement while drilling (MWD) sensors. The measurements from these downhole sensors are uplinked to the surface of the wellbore where they are converted to meaningful petrophysical data. The goal is to use the downhole petrophysical data such as gamma ray, propagation resistivity and so on, to update an existing pre-well geological model of a section of the earth in such a way that the final result depicts the true model picture of the earth subsurface.\nThis paper focuses on using well CBH-44L to showcase how the use of real-time distance-to-boundary (D2B) measurement from a deep reading azimuthal propagation resistivity tool is use to correct for depth uncertainty in seismic, thereby, improving the chance of successfully landing and drilling a horizontal well.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.3074, 0.1837],
#         [0.3074, 1.0000, 0.1640],
#         [0.1837, 0.1640, 1.0000]])

Evaluation

Metrics

Triplet

Dataset: ai-job-validation
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.8223

Training Details

Training Dataset

offshore_energy

Dataset: offshore_energy at 0ebbfc6
Size: 89,129 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 12 tokens mean: 24.68 tokens max: 77 tokens	min: 37 tokens mean: 437.61 tokens max: 983 tokens	min: 28 tokens mean: 410.96 tokens max: 1188 tokens

Samples:

anchor	positive	negative
`What is the significance of end point relative permeability of the oil phase in the productivity of oil reservoirs below bubble point pressure?`	In contrast with what is followed for Offshore Oil Operations the majority of the Onshore Oil Operations in the world do not have a Minimum and Mandatory required HSE training program for all personnel including contractors and subcontractors. A comparison is drawn between the Minimum and Mandatory HSE Training Programmes applied offshore in developed areas, mainly North Sea and Gulf of Mexico and the benefits that similar programs can bring to the ME onshore oil operations are addressed by estimating the risk reduction and potential economic benefits. The applicability of such Minimum and Mandatory HSE Training Programs is analyzed against the scenario of heavy utilization of contractors and subcontractors with different approach and standards in HSE training and also the increasing complexity of the onshore oil operations An estimation of how many lives can potentially be saved by the introduction of such programs is provided in global and generic terms. The HR Impact, in different a...	The knowledge of relative permeability is key in oil production mechanism as it affects multiphase flow which is vital to producible reserves in petroleum reservoirs. In this study, the impact of altering end point saturation on relative permeability curve and how it influences oil recovery was investigated on field X in Niger Delta, Nigeria. The saturation end points obtained after a simulation study was used as a start point to predict oil production. These end points saturation of water and oil were altered and varied according to facies. The eclipse simulation tool was used in conducting the prediction runs. The result obtained showed wide variation from actual production forecast (i.e. ≥ 25%) when end points were varied with no guided limit from experimental data. This study reveals the need for an accurate determination of residual oil saturation as it was seen to have an impact on forecast and history match.
`What role does the effective coefficient of discharge (Kd) play in calculating the required effective discharge area?`	96 API S TANDARD 520, P ART I—S IZING AND S ELECTION B.2.3.3 Using the theoretical mass flux obtained from numerical integration above, one may determine the required effective discharge area: In USC units: Q × ρ 1 × sec gal G × 60 × 7 4805 . min ft 3 A = W = Q × ρ × 1 G × K d 60 sec × 7 4805 . gal G × K d 60 × 7 4805 . d 3 528 62 2 × . 1 2 2 A = × = 0 0148 ft . = 2 135 in. . (B.8) 60 7 4805 × . 7 592 14 0 65, . × . In SI units: Q ×ρ 1 × sec liter G × 60 min × 1 000, m 3 A = W = Q ×ρ × 1 G × K d 60 sec × 1 000, liter 3 G × K , A = 2 000, × 996 9 . × 1 = 1 379 . × 10 − 3 m 2 = 1 379 mm, 2 (B.9) 60 × 1 000, 37 068, × 0 65 . where G is the theoretical mass flux through the nozzle, lb/s·ft [2] (kg/s·m [2] ); W is the required relief rate, lb/s (kg/s); Q is the required relief rate, gal/min (L/min); ρ = 1 v is the fluid density, lb/ft [3] (kg/m [3] ); K d is the effective coefficient of discharge...	S IZING, S ELECTION, AND I NSTALLATION OF P RESSURE - RELIEVING D EVICES 59 5.6.3 Sizing for Critical Flow 5.6.3.1 General 5.6.3.1.1 Pressure-relief devices in gas or vapor service that operate at critical flow conditions (see 5.6.2) may be sized using Equation (2) through Equation (7). Each of the equations may be used to calculate the effective discharge area, A, required to achieve a required flow rate through a pressure-relief device. A PRV that has an effective discharge area equal to or greater than the calculated value of A is then chosen for the application. In USC units: A = (2) A = (3) 6 32 . CK P K K d 1 b c A = (4) 1 175 . CK P K K 1 175 . CK P K K d 1 b c . In SI units: A = (5) A = (6) CK P K K d 1 b c A = CK P K K = (7) d 1 b c where A is the required effective discharge area of the device, in. [2] (mm [2] ) (see 3.20); W is the required flow through the device, lb/h (kg/h); _C...
`How many swellable packers were required to be run in the horizontal hole part for the AICV trial, and what was the purpose of this requirement?`	Removing fluid from a wellbore column, allowing a well to flow initially, or bringing a previous well back online, nitrogen lifting is commonly used in north Iraq wells. Due to the inability of coiled tubing units to be delivered on time and their high cost, operators are forced to seek for an alternative method of unloading drilling fluid. A hydraulic Jet Pump is a technology used to complete the task. A newly drilled well DB-H was chosen, and the drilling fluid volume calculated was 12,000 bbl. to pump to the surface and begin production, assuming nonstop operation between unloading and producing. The deployment of the hydraulic lift Jet Pump for both stages was planned. Well data from the operator was collected, the process design was initiated, and Jet Evaluation Modeling Software (JEMS) was used to run the design models. A Proper pump size was set up based on available data to meet operator expectations. A Reverse Circulating Jet Pump (RCJP) was chosen to be installed inside a Sli...	This development, predominantly from four artificial islands, of a giant offshore field in the United Arab Emirates (UAE) requires lateral compartmentalization with open hole packers of the 6 5/8" horizontal lower completions with lateral lengths greater than 16,000ft and total well lengths greater than 30,000ft MD. Swell Packer technology has enabled cost effective compartmentalization in horizontal laterals and is the preferred OH packer solution for the development. Deploying swell packers is regarded as being a simple solution to compartmentalizing any lateral where typically the deployment fluid differs from the fluids in which it will swell in; this application prevents the elastomer from swelling during deployment and swelling upon contact with produced or injected fluids. The use of an extended delayed oil swell packer with no delay systems in this particular application enables the packers to be deployed in a Non Aqueous Reservoir Drill in Fluid (RDFNAF) where the packer is re...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false
}

Evaluation Dataset

offshore_energy

Dataset: offshore_energy at 0ebbfc6
Size: 11,141 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 12 tokens mean: 24.37 tokens max: 53 tokens	min: 38 tokens mean: 428.35 tokens max: 978 tokens	min: 29 tokens mean: 405.3 tokens max: 1111 tokens

Samples:

anchor	positive	negative
`How does partial jacket construction differ for vessels that cannot use staybolt construction?`	9-7 – 9-10 ASME BPVC.VIII.1-2019 Figure 9-7 (2) Partial jackets that by virtue of their service or configuration do not lend themselves to staybolt construction may be fabricated by other means providing they are designed using appropriate stress values and are proof tested in accordance with UG-101(p). 444 9-8 FABRICATION (a) Fabrication of vessels shall be in accordance with applicable Parts of Subsection A and Subsection B, Part UW. The requirements of UW-13(e) do not apply to closure rings. (b) This Appendix covers fabrication of jacketed vessels by welding. Other methods of fabrication are permitted, provided the requirements of applicable parts of this Di vision are met. (c) Where only the inner vessel is subjected to lethal service, the requirements of UW-2 shall apply only to welds in the inner vessel and those welds attaching the jacket to the inner vessel. Welds attaching the jacket to the inner vessel need not be radiographed and may b...	9-5 – 9-7 ASME BPVC.VIII.1-2019 ‐ ‐ (g 5), and (g 6), may be used on any of the types of jacketed vessels shown in Figure 9-2 where t rj does not exceed [5] / 8 in. (16 mm). (7) Closures shown in Figure 9-5, sketch (h) used on Type 3 jacketed vessels shown in Figure 9-2 shall have attachment welds in accordance with Figure 9-5, sketch ‐ ‐ (i 1) or (i 2). This construction is limited to jackets where t rj does not exceed [5] / 8 in. (16 mm). (8) Closures for conical or toriconical jackets shown in Figure 9-5, sketches (k) and (l) shall comply with the requirements for Type 2 jacketed vessels shown in Figure 9-2. (d) Any radial welds in closure members shall be buttwelded joints penetrating through the full thickness of the member and shall be ground flush where attachment welds are to be made. (e) Where the inner vessel must meet the requirements of UW-2, the attachment welds of the jacket to the inner vessel need not be welded for their full thickness no...
`What dimensions must fins and studs conform to as stipulated in Section 17.4.4?`	17.4 Examination of other components 17.4.1 Examination of heater steelwork shall be in accordance with the structural design code. 17.4.2 Refractory linings shall be examined throughout for thickness variations during application and for cracks after curing. Thickness tolerance is limited to a range of minus 6 mm (1/4 in) to plus 13 mm (1/2 in). Cracks which are 3 mm (1/8 in) or greater in width and penetrate more than 50 % of the castable thickness shall be repaired. Repairs shall be made by chipping out the unsound refractory to the backup layer interface or casing and exposing a minimum of three tieback anchors, or to the sound metal, making a joint between sound refractory that has a minimum slope of 25 mm (1 in) to the base metal (dove-tail construction) and then gunning, casting or hand-packing the area to be repaired. 17.4.3 Finned extended surface shall be examined to ensure fins are perpendicular to the tube within 15°. The maximum discontinuity of the w...	16.1 -112 STEEL ANCHORS [Sect. I8. 3e. Detailing Requirements in Composite Components Steel anchors in composite components shall meet the following requirements: (a) Minimum concrete cover to steel anchors shall be in accordance with ACI 318 provisions for concrete protection of headed shear stud reinforcement. (b) Minimum center-to-center spacing of steel headed stud anchors shall be four diameters in any direction. (c) The maximum center-to-center spacing of steel headed stud anchors shall not exceed 32 times the shank diameter. (d) The maximum center-to-center spacing of steel channel anchors shall be 24 in. (600 mm). User Note: Detailing requirements provided in this section are absolute limits. See Sections I8.3a, I8.3b and I8.3c for additional limitations required to preclude edge and group effect considerations. Specification for Structural Steel Buildings, July 7, 2016 A MERICAN I NSTITUTE OF S TEEL C ONSTRUCTION
`What are some common mistakes in oil and gas project execution that lead to financial losses?`	Dozens of deepwater wells have been drilled in western South China Sea with about 30 percent have characteristics of high temperature and high pressure, which brought a series of difficulties and challenges to field operations. After incorporating the analysis of engineering and geological environment for deepwater HTHP wells in Lingshui block of western South China Sea, it is suggested that the solution of drilling problems for deepwater HTHP wells should start from drilling fluid. Several major technical problems are required to be addressed by drilling fluid, such as co-exist of low temperature and high temperature that lead to difficulty of drilling fluid maintenance and narrow density margin caused by deepwater and high pressure. Based on the above problems, combining with geological features of HTHP wells, researchers developed a novel water based drilling fluid system compatible with deepwater HTHP wells in Lingshui block on the basis of conventional HEM drilling fluid and furth...	The lack of availability of required skills and experience in most if not all parts of the oil and gas value chain is well documented so, rather than trying to make the case, we will summarise the challenge thus: the industry in all parts of the world can't find the capability it needs to safely get its work done in the timeframes it would like. However or wherever the situation is measured, the consequence is that in days when the oil price might suggest that the industry has "never had it so good", many companies are falling seriously short of stakeholder expectations with projects of all types not being completed as planned or failing to deliver anticipated returns. Close to home we see producers consistently missing quarterly production targets and a seemingly constant downgrading of forecasts and year-on-year plans. This leads to a constant stream of bad news and criticism in the media, greater stress through all levels of management and an inevitable "knee jerk" towards a more sh...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 5
warmup_ratio: 0.1

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	ai-job-validation_cosine_accuracy
0.1795	1000	-	1.1970	0.6482
0.3590	2000	-	1.1165	0.6762
0.5385	3000	-	1.0740	0.6986
0.7180	4000	-	1.0460	0.7152
0.8975	5000	1.2294	1.0200	0.7252
1.0770	6000	-	1.0162	0.7259
1.2565	7000	-	0.9827	0.7445
1.4360	8000	-	0.9690	0.7592
1.6155	9000	-	0.9499	0.7590
1.7950	10000	0.9515	0.9396	0.7673
1.9745	11000	-	0.9297	0.7617
2.1540	12000	-	0.9290	0.7770
2.3335	13000	-	0.9128	0.7862
2.5130	14000	-	0.9076	0.7846
2.6925	15000	0.744	0.8964	0.7815
2.8720	16000	-	0.8777	0.7990
3.0515	17000	-	0.8798	0.7966
3.2310	18000	-	0.8713	0.8026
3.4105	19000	-	0.8658	0.8062
3.5900	20000	0.5671	0.8513	0.8055
3.7695	21000	-	0.8387	0.8143
3.9490	22000	-	0.8295	0.8144
4.1285	23000	-	0.8327	0.8192
4.3080	24000	-	0.8332	0.8189
4.4875	25000	0.4463	0.8267	0.8192
4.6670	26000	-	0.8236	0.8208
4.8465	27000	-	0.8205	0.8223

Framework Versions

Python: 3.10.12
Sentence Transformers: 5.1.0
Transformers: 4.53.3
PyTorch: 2.8.0+cu128
Accelerate: 1.9.0
Datasets: 4.0.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 71

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for Sampath1987/EnergyEmbed-v1

Base model

Alibaba-NLP/gte-multilingual-base

Finetuned

(86)

this model

Dataset used to train Sampath1987/EnergyEmbed-v1

Evaluation results

Cosine Accuracy on ai job validation
self-reported

0.822

View on Papers With Code