Job - Job matching BAAI/bge-small-en-v1.5

Top performing model on TalentCLEF 2025 Task A. Use it for multilingual job title matching

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Datasets:
    • full_en
    • full_de
    • full_es
    • full_zh
    • mix

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Volksvertreter',
    'Parlamentarier',
    'Oberbürgermeister',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric full_en full_es full_de full_zh mix_es mix_de mix_zh
cosine_accuracy@1 0.6571 0.1243 0.2956 0.3495 0.4113 0.2943 0.0971
cosine_accuracy@20 0.9905 1.0 0.9212 0.7379 0.7613 0.65 0.3586
cosine_accuracy@50 0.9905 1.0 0.9655 0.8252 0.8523 0.7608 0.4901
cosine_accuracy@100 0.9905 1.0 0.9754 0.8544 0.9121 0.8508 0.6002
cosine_accuracy@150 0.9905 1.0 0.9852 0.9029 0.9418 0.8898 0.6613
cosine_accuracy@200 0.9905 1.0 0.9852 0.9417 0.9548 0.9204 0.7062
cosine_precision@1 0.6571 0.1243 0.2956 0.3495 0.4113 0.2943 0.0971
cosine_precision@20 0.5024 0.4897 0.4246 0.1733 0.0892 0.0731 0.0314
cosine_precision@50 0.308 0.3179 0.2814 0.0944 0.0418 0.0361 0.0185
cosine_precision@100 0.1863 0.1986 0.1801 0.0589 0.0229 0.0206 0.0116
cosine_precision@150 0.1322 0.1469 0.1362 0.0458 0.0159 0.0147 0.0087
cosine_precision@200 0.103 0.1179 0.1105 0.0385 0.0122 0.0116 0.0071
cosine_recall@1 0.068 0.0031 0.0111 0.0273 0.1565 0.1109 0.0329
cosine_recall@20 0.5385 0.3221 0.2614 0.1766 0.6594 0.5344 0.2091
cosine_recall@50 0.726 0.4638 0.3835 0.2393 0.7705 0.6585 0.3054
cosine_recall@100 0.8329 0.5438 0.4677 0.2863 0.8472 0.7525 0.3835
cosine_recall@150 0.8745 0.5825 0.5183 0.3287 0.8825 0.8026 0.4309
cosine_recall@200 0.9057 0.6147 0.5517 0.3631 0.9051 0.8418 0.4715
cosine_ndcg@1 0.6571 0.1243 0.2956 0.3495 0.4113 0.2943 0.0971
cosine_ndcg@20 0.6845 0.5385 0.4601 0.2468 0.5117 0.3919 0.1385
cosine_ndcg@50 0.704 0.5012 0.4229 0.2394 0.542 0.4256 0.1656
cosine_ndcg@100 0.7589 0.5147 0.4371 0.2619 0.5588 0.4462 0.1835
cosine_ndcg@150 0.7774 0.5348 0.4629 0.2787 0.5656 0.4561 0.1931
cosine_ndcg@200 0.7893 0.5505 0.4797 0.2919 0.5697 0.4632 0.2007
cosine_mrr@1 0.6571 0.1243 0.2956 0.3495 0.4113 0.2943 0.0971
cosine_mrr@20 0.8103 0.5515 0.4896 0.4485 0.4979 0.3779 0.1522
cosine_mrr@50 0.8103 0.5515 0.4909 0.4515 0.501 0.3815 0.1564
cosine_mrr@100 0.8103 0.5515 0.4911 0.4519 0.5018 0.3827 0.158
cosine_mrr@150 0.8103 0.5515 0.4912 0.4523 0.5021 0.3831 0.1585
cosine_mrr@200 0.8103 0.5515 0.4912 0.4525 0.5021 0.3832 0.1588
cosine_map@1 0.6571 0.1243 0.2956 0.3495 0.4113 0.2943 0.0971
cosine_map@20 0.5418 0.4028 0.3236 0.147 0.4264 0.3097 0.0875
cosine_map@50 0.5327 0.3422 0.2644 0.1267 0.4338 0.3174 0.093
cosine_map@100 0.5657 0.3395 0.2576 0.1326 0.436 0.3199 0.095
cosine_map@150 0.5734 0.3478 0.2669 0.1352 0.4366 0.3207 0.0957
cosine_map@200 0.5772 0.3534 0.2722 0.1368 0.4368 0.3212 0.0961
cosine_map@500 0.5814 0.3631 0.2833 0.1407 0.4373 0.3219 0.0971

Training Details

Training Datasets

full_en

full_en

  • Dataset: full_en
  • Size: 28,880 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 5.0 tokens
    • max: 10 tokens
    • min: 3 tokens
    • mean: 5.01 tokens
    • max: 13 tokens
  • Samples:
    anchor positive
    air commodore flight lieutenant
    command and control officer flight officer
    air commodore command and control officer
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01, 'margin_strategy': 'absolute', 'margin': 0.0}
    
full_de

full_de

  • Dataset: full_de
  • Size: 23,023 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 11.05 tokens
    • max: 45 tokens
    • min: 3 tokens
    • mean: 11.43 tokens
    • max: 45 tokens
  • Samples:
    anchor positive
    Staffelkommandantin Kommodore
    Luftwaffenoffizierin Luftwaffenoffizier/Luftwaffenoffizierin
    Staffelkommandantin Luftwaffenoffizierin
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01, 'margin_strategy': 'absolute', 'margin': 0.0}
    
full_es

full_es

  • Dataset: full_es
  • Size: 20,724 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 12.95 tokens
    • max: 50 tokens
    • min: 3 tokens
    • mean: 12.57 tokens
    • max: 50 tokens
  • Samples:
    anchor positive
    jefe de escuadrón instructor
    comandante de aeronave instructor de simulador
    instructor oficial del Ejército del Aire
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01, 'margin_strategy': 'absolute', 'margin': 0.0}
    
full_zh

full_zh

  • Dataset: full_zh
  • Size: 30,401 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 4 tokens
    • mean: 8.36 tokens
    • max: 20 tokens
    • min: 4 tokens
    • mean: 8.95 tokens
    • max: 27 tokens
  • Samples:
    anchor positive
    技术总监 技术和运营总监
    技术总监 技术主管
    技术总监 技术艺术总监
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01, 'margin_strategy': 'absolute', 'margin': 0.0}
    
mix

mix

  • Dataset: mix
  • Size: 21,760 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 2 tokens
    • mean: 5.65 tokens
    • max: 14 tokens
    • min: 2 tokens
    • mean: 10.08 tokens
    • max: 30 tokens
  • Samples:
    anchor positive
    technical manager Technischer Direktor für Bühne, Film und Fernsehen
    head of technical directora técnica
    head of technical department 技术艺术总监
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01, 'margin_strategy': 'absolute', 'margin': 0.0}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • gradient_accumulation_steps: 2
  • num_train_epochs: 5
  • warmup_ratio: 0.05
  • log_on_each_node: False
  • fp16: True
  • dataloader_num_workers: 4
  • ddp_find_unused_parameters: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: False
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: True
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss full_en_cosine_ndcg@200 full_es_cosine_ndcg@200 full_de_cosine_ndcg@200 full_zh_cosine_ndcg@200 mix_es_cosine_ndcg@200 mix_de_cosine_ndcg@200 mix_zh_cosine_ndcg@200
-1 -1 - 0.7322 0.4690 0.3853 0.2723 0.3209 0.2244 0.0919
0.0021 1 23.8878 - - - - - - -
0.2058 100 7.2098 - - - - - - -
0.4115 200 4.2635 0.7800 0.5132 0.4268 0.2798 0.4372 0.2996 0.1447
0.6173 300 4.1931 - - - - - - -
0.8230 400 3.73 0.7863 0.5274 0.4451 0.2805 0.4762 0.3455 0.1648
1.0309 500 3.3569 - - - - - - -
1.2366 600 3.6464 0.7868 0.5372 0.4540 0.2813 0.5063 0.3794 0.1755
1.4424 700 3.0772 - - - - - - -
1.6481 800 3.114 0.7906 0.5391 0.4576 0.2832 0.5221 0.4047 0.1779
1.8539 900 2.9246 - - - - - - -
2.0617 1000 2.7479 0.7873 0.5423 0.4631 0.2871 0.5323 0.4143 0.1843
2.2675 1100 3.049 - - - - - - -
2.4733 1200 2.6137 0.7878 0.5418 0.4685 0.2870 0.5470 0.4339 0.1932
2.6790 1300 2.8607 - - - - - - -
2.8848 1400 2.7071 0.7889 0.5465 0.4714 0.2891 0.5504 0.4362 0.1944
3.0926 1500 2.7012 - - - - - - -
3.2984 1600 2.7423 0.7882 0.5471 0.4748 0.2868 0.5542 0.4454 0.1976
3.5041 1700 2.5316 - - - - - - -
3.7099 1800 2.6344 0.7900 0.5498 0.4763 0.2857 0.5639 0.4552 0.1954
3.9156 1900 2.4983 - - - - - - -
4.1235 2000 2.5423 0.7894 0.5499 0.4786 0.2870 0.5644 0.4576 0.1974
4.3292 2100 2.5674 - - - - - - -
4.5350 2200 2.6237 0.7899 0.5502 0.4802 0.2843 0.5674 0.4607 0.1993
4.7407 2300 2.3776 - - - - - - -
4.9465 2400 2.1116 0.7893 0.5505 0.4797 0.2919 0.5697 0.4632 0.2007

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pj-mathematician/JobBGE-small-en-v1.5

Finetuned
(196)
this model

Collection including pj-mathematician/JobBGE-small-en-v1.5

Evaluation results