Job - Job matching Alibaba-NLP/gte-multilingual-base (v2)

Top performing model on TalentCLEF 2025 Task A. Use it for multilingual job title matching

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Alibaba-NLP/gte-multilingual-base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
- full_en
- full_de
- full_es
- full_zh
- mix

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pj-mathematician/JobGTE-multilingual-base-v2")
# Run inference
sentences = [
    'Volksvertreter',
    'Parlamentarier',
    'Oberbürgermeister',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: full_en, full_es, full_de, full_zh, mix_es, mix_de and mix_zh
Evaluated with InformationRetrievalEvaluator

Metric	full_en	full_es	full_de	full_zh	mix_es	mix_de	mix_zh
cosine_accuracy@1	0.6667	0.1243	0.2956	0.6796	0.7088	0.6485	0.7667
cosine_accuracy@20	0.9905	1.0	0.9754	0.9806	0.9553	0.9324	0.9843
cosine_accuracy@50	0.9905	1.0	0.9852	0.9903	0.9802	0.9683	0.9932
cosine_accuracy@100	0.9905	1.0	0.9901	0.9903	0.9901	0.9849	0.9958
cosine_accuracy@150	0.9905	1.0	0.9901	0.9903	0.9938	0.9886	0.9974
cosine_accuracy@200	0.9905	1.0	0.9901	0.9903	0.9958	0.9938	0.9979
cosine_precision@1	0.6667	0.1243	0.2956	0.6796	0.7088	0.6485	0.7667
cosine_precision@20	0.5148	0.5759	0.5103	0.4883	0.1216	0.1209	0.1387
cosine_precision@50	0.32	0.3923	0.3694	0.2963	0.0512	0.0514	0.0581
cosine_precision@100	0.1905	0.2566	0.2397	0.1788	0.0261	0.0265	0.0296
cosine_precision@150	0.1362	0.1928	0.1808	0.1278	0.0175	0.0179	0.0199
cosine_precision@200	0.1054	0.1528	0.1462	0.0999	0.0132	0.0135	0.0149
cosine_recall@1	0.0685	0.0036	0.0111	0.0693	0.2738	0.2436	0.2569
cosine_recall@20	0.5491	0.3853	0.3208	0.5251	0.899	0.8787	0.9157
cosine_recall@50	0.7554	0.566	0.5042	0.7083	0.9459	0.932	0.9583
cosine_recall@100	0.8503	0.6899	0.6173	0.8169	0.9651	0.9596	0.9765
cosine_recall@150	0.8995	0.754	0.6848	0.8613	0.9732	0.9718	0.9834
cosine_recall@200	0.9208	0.7858	0.7253	0.8898	0.9791	0.98	0.9865
cosine_ndcg@1	0.6667	0.1243	0.2956	0.6796	0.7088	0.6485	0.7667
cosine_ndcg@20	0.6952	0.6169	0.5378	0.6681	0.7815	0.7448	0.8002
cosine_ndcg@50	0.723	0.5914	0.5288	0.6857	0.7944	0.7595	0.8125
cosine_ndcg@100	0.7733	0.6235	0.5552	0.7379	0.7986	0.7657	0.8167
cosine_ndcg@150	0.7947	0.6557	0.5888	0.7577	0.8001	0.7682	0.8181
cosine_ndcg@200	0.8039	0.6717	0.6092	0.7697	0.8012	0.7696	0.8187
cosine_mrr@1	0.6667	0.1243	0.2956	0.6796	0.7088	0.6485	0.7667
cosine_mrr@20	0.8183	0.5581	0.5165	0.8159	0.7804	0.7324	0.8422
cosine_mrr@50	0.8183	0.5581	0.5168	0.8163	0.7813	0.7335	0.8425
cosine_mrr@100	0.8183	0.5581	0.5168	0.8163	0.7814	0.7338	0.8425
cosine_mrr@150	0.8183	0.5581	0.5168	0.8163	0.7814	0.7338	0.8425
cosine_mrr@200	0.8183	0.5581	0.5168	0.8163	0.7814	0.7338	0.8426
cosine_map@1	0.6667	0.1243	0.2956	0.6796	0.7088	0.6485	0.7667
cosine_map@20	0.5566	0.4841	0.3984	0.5222	0.7071	0.6646	0.7007
cosine_map@50	0.5534	0.4304	0.3603	0.5083	0.7107	0.6684	0.7046
cosine_map@100	0.5852	0.4374	0.3632	0.5372	0.7113	0.6693	0.7054
cosine_map@150	0.5943	0.4527	0.3782	0.5454	0.7114	0.6695	0.7055
cosine_map@200	0.5976	0.4593	0.3863	0.5495	0.7115	0.6696	0.7056
cosine_map@500	0.6016	0.472	0.3992	0.5542	0.7116	0.6697	0.7057

Training Details

Training Datasets

full_en

Dataset: full_en
Size: 28,880 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 5.68 tokens
max: 11 tokens

min: 3 tokens
mean: 5.76 tokens
max: 12 tokens
Samples:

anchor positive

air commodore flight lieutenant

command and control officer flight officer

air commodore command and control officer

	anchor	positive
type	string	string
details	min: 3 tokens mean: 5.68 tokens max: 11 tokens	min: 3 tokens mean: 5.76 tokens max: 12 tokens

anchor	positive
`air commodore`	`flight lieutenant`
`command and control officer`	`flight officer`
`air commodore`	`command and control officer`