SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v_1_0_6")
# Run inference
sentences = [
    '科目：ユニット及びその他。名称：ﾃﾗｽ床再生木ﾃﾞｯｷ。',
    '科目：ユニット及びその他。名称：駐車ｿﾞｰﾝｻｲﾝ。',
    '科目：ユニット及びその他。名称：#階 MWC､WWC他姿見鏡。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 12,683 training samples
Columns: sentence and label

Approximate statistics based on the first 1000 samples:

	sentence	label
type	string	int
details	min: 11 tokens mean: 18.16 tokens max: 54 tokens	0: ~0.30% 1: ~0.30% 2: ~0.30% 3: ~0.30% 4: ~0.30% 5: ~0.30% 6: ~0.30% 7: ~0.30% 8: ~0.30% 9: ~0.30% 10: ~0.30% 11: ~0.30% 12: ~1.10% 13: ~0.30% 14: ~0.30% 15: ~0.30% 16: ~0.30% 17: ~0.30% 18: ~0.30% 19: ~0.30% 20: ~0.30% 21: ~0.30% 22: ~0.30% 23: ~0.40% 24: ~0.30% 25: ~0.30% 26: ~0.30% 27: ~0.90% 28: ~0.30% 29: ~0.40% 30: ~0.30% 31: ~1.10% 32: ~0.30% 33: ~0.30% 34: ~0.30% 35: ~0.30% 36: ~0.30% 37: ~0.30% 38: ~0.30% 39: ~0.30% 40: ~0.30% 41: ~0.30% 42: ~0.30% 43: ~0.30% 44: ~0.30% 45: ~0.30% 46: ~0.30% 47: ~0.30% 48: ~0.30% 49: ~0.40% 50: ~0.30% 51: ~0.30% 52: ~0.30% 53: ~0.60% 54: ~0.30% 55: ~0.30% 56: ~0.30% 57: ~0.30% 58: ~0.30% 59: ~0.30% 60: ~0.30% 61: ~0.30% 62: ~0.30% 63: ~0.30% 64: ~0.30% 65: ~0.30% 66: ~0.30% 67: ~0.30% 68: ~0.30% 69: ~0.30% 70: ~0.30% 71: ~0.30% 72: ~0.50% 73: ~0.30% 74: ~0.30% 75: ~0.30% 76: ~0.30% 77: ~0.30% 78: ~0.30% 79: ~0.30% 80: ~0.30% 81: ~0.30% 82: ~0.30% 83: ~0.30% 84: ~0.30% 85: ~0.30% 86: ~0.30% 87: ~0.30% 88: ~0.80% 89: ~0.30% 90: ~0.30% 91: ~0.30% 92: ~0.30% 93: ~0.30% 94: ~0.30% 95: ~0.30% 96: ~0.30% 97: ~0.50% 98: ~0.30% 99: ~0.30% 100: ~0.30% 101: ~0.30% 102: ~0.80% 103: ~0.60% 104: ~0.50% 105: ~0.30% 106: ~0.30% 107: ~16.50% 108: ~0.30% 109: ~0.30% 110: ~0.30% 111: ~0.30% 112: ~0.30% 113: ~0.30% 114: ~0.30% 115: ~0.30% 116: ~0.50% 117: ~0.30% 118: ~0.30% 119: ~0.30% 120: ~0.30% 121: ~0.30% 122: ~0.30% 123: ~0.30% 124: ~0.30% 125: ~0.70% 126: ~0.30% 127: ~0.30% 128: ~0.30% 129: ~0.40% 130: ~2.10% 131: ~2.10% 132: ~0.30% 133: ~0.30% 134: ~0.50% 135: ~0.50% 136: ~0.50% 137: ~0.40% 138: ~0.30% 139: ~0.30% 140: ~0.30% 141: ~0.30% 142: ~0.30% 143: ~0.30% 144: ~0.30% 145: ~0.30% 146: ~0.30% 147: ~0.30% 148: ~0.30% 149: ~0.30% 150: ~0.30% 151: ~0.30% 152: ~0.30% 153: ~0.30% 154: ~0.50% 155: ~0.30% 156: ~0.40% 157: ~0.30% 158: ~0.30% 159: ~0.30% 160: ~0.30% 161: ~0.30% 162: ~0.30% 163: ~0.30% 164: ~0.30% 165: ~0.30% 166: ~0.30% 167: ~0.30% 168: ~0.30% 169: ~0.40% 170: ~0.30% 171: ~0.30% 172: ~0.30% 173: ~0.30% 174: ~0.30% 175: ~0.30% 176: ~0.70% 177: ~0.30% 178: ~0.30% 179: ~0.30% 180: ~0.30% 181: ~1.30% 182: ~0.30% 183: ~0.40% 184: ~0.30% 185: ~0.30% 186: ~0.30% 187: ~1.50% 188: ~0.30% 189: ~0.30% 190: ~0.30% 191: ~0.30% 192: ~0.30% 193: ~0.30% 194: ~0.30% 195: ~1.60% 196: ~0.30% 197: ~0.30% 198: ~7.20% 199: ~0.30% 200: ~1.00% 201: ~0.30% 202: ~0.30% 203: ~0.30% 204: ~0.90%

Samples:

sentence	label
`科目：コンクリート。名称：免震基礎天端ｸﾞﾗｳﾄ注入。`	`0`
`科目：コンクリート。名称：免震基礎天端ｸﾞﾗｳﾄ注入。`	`0`
`科目：コンクリート。名称：免震基礎天端ｸﾞﾗｳﾄ注入。`	`0`

Loss: sentence_transformer_lib.custom_batch_all_trip_loss.CustomBatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 512
per_device_eval_batch_size: 512
learning_rate: 1e-05
weight_decay: 0.01
num_train_epochs: 250
warmup_ratio: 0.2
fp16: True
batch_sampler: group_by_label

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 512
per_device_eval_batch_size: 512
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 250
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.2
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: group_by_label
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss
2.16	50	0.0584
4.32	100	0.0591
6.48	150	0.0675
8.64	200	0.0637
10.8	250	0.0637
13.04	300	0.0647
15.2	350	0.0656
17.36	400	0.0578
19.52	450	0.0585
21.68	500	0.0546
23.84	550	0.0523
26.08	600	0.0563
28.24	650	0.0526
30.4	700	0.0532
32.56	750	0.0546
34.72	800	0.0483
36.88	850	0.0566
39.12	900	0.0482
41.28	950	0.0508
43.44	1000	0.05
45.6	1050	0.0471
47.76	1100	0.0502
49.92	1150	0.0477
52.16	1200	0.0429
54.32	1250	0.0415
56.48	1300	0.0433
58.64	1350	0.0489
60.8	1400	0.0494
63.04	1450	0.0412
65.2	1500	0.0447
67.36	1550	0.0379
69.52	1600	0.0401
71.68	1650	0.0449
73.84	1700	0.0377
76.08	1750	0.0375
78.24	1800	0.0394
80.4	1850	0.0392
82.56	1900	0.0404
84.72	1950	0.0392
86.88	2000	0.0427
89.12	2050	0.0357
91.28	2100	0.0339
93.44	2150	0.0443
95.6	2200	0.0405
97.76	2250	0.0362
99.92	2300	0.0323
102.16	2350	0.0335
104.32	2400	0.0408
106.48	2450	0.034
108.64	2500	0.0383
110.8	2550	0.0299
113.04	2600	0.0306
115.2	2650	0.0351
117.36	2700	0.0322
119.52	2750	0.041
121.68	2800	0.0292
123.84	2850	0.027
126.08	2900	0.0323
128.24	2950	0.0355
130.4	3000	0.0366
132.56	3050	0.0312
134.72	3100	0.0279
136.88	3150	0.0306
139.12	3200	0.0245
141.28	3250	0.0325
143.44	3300	0.0356
145.6	3350	0.0362
147.76	3400	0.0287
149.92	3450	0.0339
1.6389	50	0.0386
3.5278	100	0.0366
5.4167	150	0.0364
7.3056	200	0.0394
9.1944	250	0.0387
11.0833	300	0.0407
12.7222	350	0.0392
14.6111	400	0.0395
16.5	450	0.0393
18.3889	500	0.0361
20.2778	550	0.0347
22.1667	600	0.0346
24.0556	650	0.0371
25.6944	700	0.0411
27.5833	750	0.0329
29.4722	800	0.0337
31.3611	850	0.0325
33.25	900	0.034
35.1389	950	0.0352
37.0278	1000	0.0305
38.6667	1050	0.0311
40.5556	1100	0.0314
42.4444	1150	0.0307
44.3333	1200	0.0324
46.2222	1250	0.0355
48.1111	1300	0.0306
49.75	1350	0.027
51.6389	1400	0.0282
53.5278	1450	0.0318
55.4167	1500	0.0314
57.3056	1550	0.0323
59.1944	1600	0.0286
61.0833	1650	0.0338
62.7222	1700	0.0287
64.6111	1750	0.0309
66.5	1800	0.0287
68.3889	1850	0.028
70.2778	1900	0.026
72.1667	1950	0.0269
74.0556	2000	0.0295
75.6944	2050	0.0257
77.5833	2100	0.0261
79.4722	2150	0.0304
81.3611	2200	0.0265
83.25	2250	0.0274
85.1389	2300	0.0276
87.0278	2350	0.0325
88.6667	2400	0.0233
90.5556	2450	0.0212
92.4444	2500	0.0243
94.3333	2550	0.0288
96.2222	2600	0.026
98.1111	2650	0.029
99.75	2700	0.0228
101.6389	2750	0.0265
103.5278	2800	0.017
105.4167	2850	0.026
107.3056	2900	0.0257
109.1944	2950	0.0237
111.0833	3000	0.0261
112.7222	3050	0.0204
114.6111	3100	0.0186
116.5	3150	0.0206
118.3889	3200	0.0233
120.2778	3250	0.0235
122.1667	3300	0.0232
124.0556	3350	0.0194
125.6944	3400	0.0242
127.5833	3450	0.0234
129.4722	3500	0.023
131.3611	3550	0.0187
133.25	3600	0.0208
135.1389	3650	0.0201
137.0278	3700	0.024
138.6667	3750	0.0255
140.5556	3800	0.0201
142.4444	3850	0.0231
144.3333	3900	0.0199
146.2222	3950	0.018
148.1111	4000	0.0228
149.75	4050	0.0204
151.6389	4100	0.025
153.5278	4150	0.0163
155.4167	4200	0.0157
157.3056	4250	0.0189
159.1944	4300	0.0176
161.0833	4350	0.03
162.7222	4400	0.0197
164.6111	4450	0.0207
166.5	4500	0.0189
168.3889	4550	0.0132
170.2778	4600	0.0178
172.1667	4650	0.0216
174.0556	4700	0.0174
175.6944	4750	0.0229
177.5833	4800	0.0181
179.4722	4850	0.0161
181.3611	4900	0.0236
183.25	4950	0.0185
185.1389	5000	0.02
187.0278	5050	0.0147
188.6667	5100	0.0203
190.5556	5150	0.0159
192.4444	5200	0.0133
194.3333	5250	0.0192
196.2222	5300	0.0162
198.1111	5350	0.0183
199.75	5400	0.015
201.6389	5450	0.0145
203.5278	5500	0.017
205.4167	5550	0.0219
207.3056	5600	0.0195
209.1944	5650	0.0186
211.0833	5700	0.0142
212.7222	5750	0.0191
214.6111	5800	0.0167
216.5	5850	0.013
218.3889	5900	0.0154
220.2778	5950	0.0135
222.1667	6000	0.0139
224.0556	6050	0.0203
225.6944	6100	0.0169
227.5833	6150	0.0146
229.4722	6200	0.0206
231.3611	6250	0.0149
233.25	6300	0.014
235.1389	6350	0.0174
237.0278	6400	0.0191
238.6667	6450	0.0137
240.5556	6500	0.0125
242.4444	6550	0.0081
244.3333	6600	0.0145
246.2222	6650	0.0116
248.1111	6700	0.0154
249.75	6750	0.0179

Framework Versions

Python: 3.11.12
Sentence Transformers: 3.4.1
Transformers: 4.51.3
PyTorch: 2.6.0+cu124
Accelerate: 1.6.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CustomBatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}