Job - Job matching finetuned BAAI/bge-m3

Top performing model on TalentCLEF 2025 Task A. Use it for multilingual job title matching

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-m3
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
- full_en
- full_de
- full_es
- full_zh
- mix

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pj-mathematician/JobBGE-m3")
# Run inference
sentences = [
    'Volksvertreter',
    'Parlamentarier',
    'Oberbürgermeister',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: full_en, full_es, full_de, full_zh, mix_es, mix_de and mix_zh
Evaluated with InformationRetrievalEvaluator

Metric	full_en	full_es	full_de	full_zh	mix_es	mix_de	mix_zh
cosine_accuracy@1	0.6476	0.1135	0.2956	0.6796	0.7395	0.6927	0.1789
cosine_accuracy@20	0.9905	1.0	0.9852	0.9903	0.9636	0.9641	1.0
cosine_accuracy@50	0.9905	1.0	0.9901	0.9903	0.9828	0.9839	1.0
cosine_accuracy@100	0.9905	1.0	0.9901	0.9903	0.9927	0.9922	1.0
cosine_accuracy@150	0.9905	1.0	0.9901	0.9903	0.9948	0.9932	1.0
cosine_accuracy@200	0.9905	1.0	0.9901	0.9903	0.9964	0.9943	1.0
cosine_precision@1	0.6476	0.1135	0.2956	0.6796	0.7395	0.6927	0.1789
cosine_precision@20	0.5062	0.5668	0.5404	0.4709	0.1249	0.128	0.1544
cosine_precision@50	0.3065	0.3903	0.3828	0.2804	0.0517	0.0533	0.0618
cosine_precision@100	0.1858	0.2525	0.2503	0.1732	0.0263	0.0271	0.0309
cosine_precision@150	0.1325	0.1901	0.1878	0.1239	0.0176	0.0181	0.0206
cosine_precision@200	0.1025	0.1508	0.1503	0.0977	0.0133	0.0136	0.0154
cosine_recall@1	0.0669	0.0035	0.0111	0.0643	0.2854	0.2604	0.0577
cosine_recall@20	0.5392	0.3796	0.3433	0.5119	0.9226	0.9285	1.0
cosine_recall@50	0.72	0.5636	0.534	0.6727	0.9548	0.965	1.0
cosine_recall@100	0.8254	0.6727	0.6499	0.788	0.9705	0.9796	1.0
cosine_recall@150	0.872	0.736	0.7101	0.8329	0.9766	0.9837	1.0
cosine_recall@200	0.9006	0.7698	0.7513	0.8687	0.9811	0.9862	1.0
cosine_ndcg@1	0.6476	0.1135	0.2956	0.6796	0.7395	0.6927	0.1789
cosine_ndcg@20	0.6822	0.6136	0.5648	0.6515	0.8119	0.7967	0.5443
cosine_ndcg@50	0.6975	0.5908	0.5522	0.6599	0.8208	0.8069	0.5443
cosine_ndcg@100	0.752	0.6168	0.5796	0.7157	0.8243	0.8102	0.5443
cosine_ndcg@150	0.7725	0.6489	0.6112	0.7357	0.8255	0.811	0.5443
cosine_ndcg@200	0.7827	0.6653	0.6309	0.7501	0.8262	0.8114	0.5443
cosine_mrr@1	0.6476	0.1135	0.2956	0.6796	0.7395	0.6927	0.1789
cosine_mrr@20	0.8	0.5536	0.5164	0.8217	0.8059	0.7767	0.4002
cosine_mrr@50	0.8	0.5536	0.5166	0.8217	0.8066	0.7774	0.4002
cosine_mrr@100	0.8	0.5536	0.5166	0.8217	0.8067	0.7775	0.4002
cosine_mrr@150	0.8	0.5536	0.5166	0.8217	0.8067	0.7775	0.4002
cosine_mrr@200	0.8	0.5536	0.5166	0.8217	0.8067	0.7775	0.4002
cosine_map@1	0.6476	0.1135	0.2956	0.6796	0.7395	0.6927	0.1789
cosine_map@20	0.5392	0.481	0.4222	0.5012	0.744	0.721	0.3272
cosine_map@50	0.5258	0.4304	0.3791	0.4813	0.7465	0.7238	0.3272
cosine_map@100	0.558	0.4335	0.3829	0.5105	0.7469	0.7242	0.3272
cosine_map@150	0.5666	0.4485	0.3981	0.5184	0.747	0.7243	0.3272
cosine_map@200	0.5695	0.4551	0.4056	0.5228	0.7471	0.7244	0.3272
cosine_map@500	0.5744	0.4677	0.4189	0.5277	0.7472	0.7244	0.3272

Training Details

Training Datasets

full_en

Dataset: full_en
Size: 28,880 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 5.68 tokens
max: 11 tokens

min: 3 tokens
mean: 5.76 tokens
max: 12 tokens
Samples:

anchor positive

air commodore flight lieutenant

command and control officer flight officer

air commodore command and control officer

	anchor	positive
type	string	string
details	min: 3 tokens mean: 5.68 tokens max: 11 tokens	min: 3 tokens mean: 5.76 tokens max: 12 tokens

anchor	positive
`air commodore`	`flight lieutenant`
`command and control officer`	`flight officer`
`air commodore`	`command and control officer`