Job - Job matching Alibaba-NLP/gte-multilingual-base (v1)

Top performing model on TalentCLEF 2025 Task A. Use it for multilingual job title matching

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Alibaba-NLP/gte-multilingual-base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
- full_en
- full_de
- full_es
- full_zh
- mix

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pj-mathematician/JobGTE-multilingual-base-v1")
# Run inference
sentences = [
    'Volksvertreter',
    'Parlamentarier',
    'Oberbürgermeister',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: full_en, full_es, full_de, full_zh, mix_es, mix_de and mix_zh
Evaluated with InformationRetrievalEvaluator

Metric	full_en	full_es	full_de	full_zh	mix_es	mix_de	mix_zh
cosine_accuracy@1	0.6571	0.1243	0.2956	0.6602	0.728	0.6703	0.1908
cosine_accuracy@20	0.9905	1.0	0.9704	0.9806	0.96	0.9506	1.0
cosine_accuracy@50	0.9905	1.0	0.9852	0.9903	0.9792	0.9776	1.0
cosine_accuracy@100	0.9905	1.0	0.9852	0.9903	0.9943	0.9865	1.0
cosine_accuracy@150	0.9905	1.0	0.9901	0.9903	0.9958	0.9932	1.0
cosine_accuracy@200	0.9905	1.0	0.9901	0.9903	0.9974	0.9948	1.0
cosine_precision@1	0.6571	0.1243	0.2956	0.6602	0.728	0.6703	0.1908
cosine_precision@20	0.5171	0.5719	0.5084	0.4782	0.1243	0.1252	0.1544
cosine_precision@50	0.316	0.3885	0.3654	0.2895	0.0515	0.0523	0.0618
cosine_precision@100	0.189	0.2517	0.2413	0.1757	0.0263	0.0267	0.0309
cosine_precision@150	0.1338	0.1905	0.1804	0.126	0.0176	0.018	0.0206
cosine_precision@200	0.1043	0.1522	0.1447	0.0982	0.0133	0.0135	0.0154
cosine_recall@1	0.0678	0.0037	0.0111	0.0615	0.2813	0.2524	0.0614
cosine_recall@20	0.547	0.3842	0.3221	0.5108	0.9183	0.9096	1.0
cosine_recall@50	0.74	0.5641	0.5025	0.6923	0.9499	0.9482	1.0
cosine_recall@100	0.8453	0.6742	0.6248	0.8004	0.9701	0.9685	1.0
cosine_recall@150	0.8838	0.7464	0.683	0.8465	0.9768	0.9782	1.0
cosine_recall@200	0.9109	0.7825	0.7216	0.8771	0.9818	0.981	1.0
cosine_ndcg@1	0.6571	0.1243	0.2956	0.6602	0.728	0.6703	0.1908
cosine_ndcg@20	0.6954	0.6139	0.5393	0.654	0.8044	0.7736	0.5474
cosine_ndcg@50	0.715	0.5874	0.5267	0.6707	0.813	0.7844	0.5474
cosine_ndcg@100	0.7679	0.6144	0.5579	0.7234	0.8173	0.7889	0.5474
cosine_ndcg@150	0.7857	0.6499	0.588	0.7438	0.8186	0.7909	0.5474
cosine_ndcg@200	0.797	0.6681	0.6071	0.7554	0.8195	0.7914	0.5474
cosine_mrr@1	0.6571	0.1243	0.2956	0.6602	0.728	0.6703	0.1908
cosine_mrr@20	0.8138	0.5581	0.5104	0.8037	0.7969	0.752	0.4093
cosine_mrr@50	0.8138	0.5581	0.511	0.8041	0.7975	0.7529	0.4093
cosine_mrr@100	0.8138	0.5581	0.511	0.8041	0.7977	0.7531	0.4093
cosine_mrr@150	0.8138	0.5581	0.511	0.8041	0.7977	0.7531	0.4093
cosine_mrr@200	0.8138	0.5581	0.511	0.8041	0.7977	0.7531	0.4093
cosine_map@1	0.6571	0.1243	0.2956	0.6602	0.728	0.6703	0.1908
cosine_map@20	0.5579	0.4799	0.401	0.5087	0.7351	0.6968	0.3298
cosine_map@50	0.5471	0.425	0.3588	0.4926	0.7374	0.6996	0.3298
cosine_map@100	0.5796	0.4302	0.3633	0.5217	0.738	0.7003	0.3298
cosine_map@150	0.5875	0.4459	0.3777	0.5299	0.7381	0.7004	0.3298
cosine_map@200	0.5912	0.4533	0.3848	0.5334	0.7382	0.7005	0.3298
cosine_map@500	0.5953	0.4656	0.3978	0.5386	0.7383	0.7006	0.3298

Training Details

Training Datasets

full_en

Dataset: full_en
Size: 28,880 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 5.68 tokens
max: 11 tokens

min: 3 tokens
mean: 5.76 tokens
max: 12 tokens
Samples:

anchor positive

air commodore flight lieutenant

command and control officer flight officer

air commodore command and control officer

	anchor	positive
type	string	string
details	min: 3 tokens mean: 5.68 tokens max: 11 tokens	min: 3 tokens mean: 5.76 tokens max: 12 tokens

anchor	positive
`air commodore`	`flight lieutenant`
`command and control officer`	`flight officer`
`air commodore`	`command and control officer`