A static embedding model tokenized with dbmdz/bert-base-german-uncased and mainly built on DE/EN-datasets as a base for further experiments.

This is a sentence-transformers model trained on 74 datasets (full list at the bottom). It maps sentences & paragraphs to a 2048-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Further explanations of how to build such a model, you can find in the Static Embeddings blogpost by Tom Aarsen in January 2025. It took me until the end of May to start this tiny spare time experiment.

After some tests with different tokenizers I decided to pick one of the oldest as it has performed best by delivering the smallest size (~240MB) – bert-base-german-uncased by the dbmdz-team.

99% performance: Unexpectedly this model scored nearly 99% in comparison to e5-base-sts-en-de during the GermanGovServiceRetrieval-Task in MTEB by taking only a 80th of the time (40.3 seconds vs. 0.49).
Matryoshka: This model was trained with a Matryoshka loss, allowing you to truncate the embeddings for faster retrieval at minimal performance costs.
Evaluations: See Evaluations for details on performance on German MTEB, special GermanGovService retrieval, embedding speed, and Matryoshka dimensionality truncation.
Training Script: See base_train.py for the training script used to train this model from scratch (be warned - it is wildly commented).

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: inf tokens
Output Dimensionality: 2048 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
- mmarco - german only, filtered and with 3, 2, 1 hard negatives and (leave no sentence behind) the rest as 0 negatives:
  - mmarco_3hn
  - mmarco_2hn
  - mmarco_1hn
  - mmarco_0hn
- deutsche-telekom/wikipedia-22-12-de-dpr - deduplicated and recombined all different wordings of direct and indirect sentences. Built hard negatives but in the end reversed from hard negatives as it did not really work out.
  - wp-22-12-de
- nthakur/swim-ir-monolingual - german only, deduplicated and different combinations with and without 3 hard negatives.
  - swim_ir_de
  - swim_ir_de_3hn
  - swim_ir_de_title_3hn
  - swim_ir_de_title
- avemio_triples - thanks to Avemio for this release.
- avemio_pairs - no pairs anymore - now with 3 hard negatives per pair and the rest as pairs:
  - avemio_pairs_3hn
  - avemio_pairs_0hn
- oliverguhr/natural-questions-german - combined english and german sentences with 3 and 1 hard negatives.
  - nq_german_en_de_a_3hn
  - nq_german_en_de_3hn
  - nq_german_3hn
  - nq_german_1hn
- AgentWaller/german-oasst1-qa-format - rebuilt with 3 hard negatives
  - german_oasst1_hn
- germanrag_short
- jphme/slimorca_dedup_german_experimental - after scoring and filtering mined as many hard duplicates as possible and leave no sentence behind:
  - slimorca_dedup_3hn
  - slimorca_dedup_2hn
  - slimorca_dedup_1hn
  - slimorca_dedup_0hn
- CausalLM/GPT-4-Self-Instruct-German - after scoring and filtering mined 3 hard negatives:
  - german_gpt4_3hn
- german_orca_dpo
- mayflowergmbh/alpaca-gpt4_de - after scoring and filtering mined 3 hard negatives and left no sentence behind (0hn):
  - alpaca_gpt4_3hn
  - alpaca_gpt4_0hn
- argilla/databricks-dolly-15k-curated-multilingual - after scoring and filtering mined 3 hard negatives and left no sentence behind (0hn) - but sometimes only 1 or 2 sentences were left:
  - dolly_context_de_3hn
  - dolly_context_ende_3hn
  - dolly_instructions_de_3hn
  - dolly_instructions_de_0hn
  - dolly_instructions_ende_3hn
  - dolly_responses_de_3hn
  - dolly_responses_de_0hn
  - dolly_responses_ende_3hn
- saf_legal_de
- lavis-nlp/german_legal_sentences - mined 3 hard negatives and left no sentences behind (0hn). Almost noone uses this dataset but for german law stuff, it's very helpful.
  - gls_3hn
  - gls_2hn
  - gls_1hn
  - gls_0hn
- sentence-transformers/parallel-sentences-europarl - after scoring, filtering and mining 3 hard negatives the results were much better - there are many "bad" trnaslations and even empty fields:
  - europarl_3hn
  - europarl_0hn
- sentence-transformers/parallel-sentences-tatoeba - mined 3 hard negatives and left no sentences behind (0hn):
  - tatoeba_3hn
  - tatoeba_0hn
- sentence-transformers/parallel-sentences-wikimatrix - mined 3 hard negatives but did not use the leftover sentence-pairs due low scores:
  - wikimatrix_3hn
- laion/Wikipedia-Abstract - mined 3 hard negatives and left no sentences behind (0hn):
  - wikipedia_abstract_3hn
  - wikipedia_abstract_0hn
- jfeil/GermanDefinitionGeneration-Distillation - built multiple combinations for classifications of long to short, mined 3 hard negatives. Also built a short-word list without hard negatives:
  - wiktionary_gdg_de_3hn
  - wiktionary_gdg_de_short
- wmt24pp - filtered - not sure, if this is not in one of the benchmark datasets?
- synthia_de - filtered for "scores".
- deutsche-telekom/ger-backtrans-paraphrase - combined german/english sentences, filtered and mined 3 hard negatives.
  - gbp_3hn
  - gbp_ende_3hn
- PhilipMay/stsb_multi_mt - mined 3 hard negatives per each language version (german/english):
  - stbs_de_3hn
  - stbs_en_3hn
- google-research-datasets/paws-x
  - pawsx_de
  - pawsx_en
- MoritzLaurer/multilingual-NLI-26lang-2mil7 - with max. 3 hard negatives (german only)
  - nli_anli_entail_3hn
  - nli_fever_entail_3hn
  - nli_ling_entail_3hn
  - nli_mnli_entail_3hn
  - nli_wanli_entail_3hn
  - nli_anli_transl_3hn
  - nli_fever_transl_3hn
  - nli_ling_transl_3hn
  - nli_mnli_transl_3hn
  - nli_wanli_transl_3hn
- jinaai/parallel-sentences - with max. 3 hard negatives (with 3 german/english combinations)
  - jina_ai_3en
  - jina_ai_ende
  - jina_ai_dede
- Polyglot-or-Not/Fact-Completion
  - polyglot_de
  - polyglot_en
- Tilde Model - EESC - an almost forgotten corpus from document texts of European Economic and Social Committee document portal.
- miracl/miracl-corpus - scored and filtered ('cos_sim_sts_de' > 0.5 and 'cos_sim_sts_de' < 0.85 and 'text_unique_tokens_de' > 6). Mined 3 hard negatives and left no sentence behind.
  - miracl_de_3hn
  - miracl_de_0hn
Languages: de, en
License: eupl-1.2

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): StaticEmbedding(
    (embedding): EmbeddingBag(31102, 2048, mode='mean')
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("MarcGrumpyOlejak/sts-mrl-en-de-base-v1")
# Run inference
queries = [
    "Im April 1928 beschrieb er in seinem Artikel On the Construction of Tables by Interpolation die Verwendung von Lochkartenger\u00e4ten zum Interpolieren von Datentabellen und verglich dies mit den weniger effizienten und fehleranf\u00e4lligeren Methoden mit mechanischen Ger\u00e4ten wie den Windradrechnern unter dem Markennamen Brunsviga.",
]
documents = [
    'Im April 1928 beschrieb er in seinem Artikel „On the Construction of Tables by Interpolation“ („Über die Erstellung von Tabellen durch Interpolation“) die Interpolation von Daten in Tabellen mit Hilfe von Lochkarten und verglich diese Methode mit dem uneffizienteren und fehleranfälligeren Verfahren, das mechanische Rechner verwendet.',
    'POLES liefert nicht die direkten makro-ökonomischen Auswirkungen der Minderungsmaßnahmen wie im Stern-Report vorgesehen, erlaubt jedoch eine detaillierte Abschätzung der Kosten im Zusammenhang mit Techniken mit wenig Energieverbrauch oder Nullenergietechniken.',
    'Im Lehrbuch Maschinenelemente – Funktion, Gestaltung und Berechnung von Decker (bisher 19 Auflagen) wird anhand praktischer Anwendungen mit Z88 die Berechnung von Maschinenelementen mit der Finiten-Elemente-Analyse gelehrt.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 2048] [3, 2048]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.7737, 0.1275, 0.1184]])

Out-of-Scope Use

After several tests it is not really good in reranking. Also everything about "news" is really low due the fact, that there is no open licensed and commercially usable dataset available. Maybe you have knowledge about an official free and open licensed news-based dataset. Feel free to contact me.

Evaluation

All steps and evaluations have been made locally on my very small hardware using a Nvidia RTX 2070 SUPER (8 GB) - no joke.

This model has been benchmarked using mainly the GermanGovServiceRetrieval task, developed by the Munich city administration. It associates questions with a textual context containing the answer. The idea is to train it further on based upon german based administraional classification datasets. After the first results the full german MTEB(deu, v1) has also been tested as the GermanGovServiceRetrieval test is not a part of the german MTEB-benchmark. While testing with NanoBEIR it has been shown to be a bit insufficient for testing bilingual german/english - but I accidentally outscored static-similarity-mrl-multilingual-v1 by 0.03 points ;)

As for the static embeddings being built using Model2Vec, with alikia2x/jina-embedding-v3-m2v-1024 I picked the largest one I could find with ~1GB size.

This model is compared against the excellent e5-base-sts-en-de model made by Daniel Heinz back in 2024 (ca. 1.1GB). The second model for comparisons with dense embeddings is the optimized granite-embedding-107m-multilingual model made by the IBM-Granite-team (ca. 770MB).

Benchmark details

Oops - I forgot to NanoBEIR granite-embedding-107m-multilingual - that's for the week-end.

	NanoBEIR	MTEB	MTEB(deu, v1) – avg
Dense Embeddings	NanoBEIR_mean_cosine_ndcg@10	GermanGovServiceRetrieval	Naive (sum/num)
e5-base-sts-en-de	0,5320	0,7931	0,5194
granite-embedding-107m-multilingual		0,7880	0,4992
Static Embeddings
static-retrieval-mrl-en-v1(*)	0,5035	0,6630	0,3716
jina-embedding-v3-m2v-1024	0,3480	0,7260	0,4081
static-similarity-mrl-multilingual-v1	0,4350	0,7281	0,4259
sts-mrl-en-de-base-v1	0,4680	0,7841	0,4566

((*)'static-retrieval-mrl-en-v1' only for comparison to mainly english based NanoBEIR)

MTEB - GermanGovServiceRetrieval Evaluation

As e5-base-sts-en-de scores with 0.7931 in the GermanGovServiceRetrieval task, that means sts-mrl-en-de-base-v1 with 0.7841 achieves 98.865% for the same task by using only ~230MB RAM and a CPU.

So it is only 0,4949% behind granite-embedding-107m-multilingual.

MTEB(deu, v1) – avg

For the german version of the MTEB benchmark MTEB(deu, v1) the results are not as significant as the GermanGovServiceRetrieval task - but with 87,909% of quality in comparison to e5-base-sts-en-de you can use sts-mrl-en-de-base-v1 for example to mine hard negatives in a really short time instead of burning money with a whole bunch of GPU.

Even with the really well speed optimised granite-embedding-107m-multilingual being almost as fast as the static embeddings, you'll still need a GPU.

Matryoshka Evaluation

(have to be checked twice - looks like almost everyone has a glitch in the results … the results are better with a first reduction from 2048 down to 1024 dimensions? That's the 2nd thing for the week-end.)

Training Datasets

Sadly all details of the datasets had to be saved in a seperate file details_datasets.md as this README.md has a limit.

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 4096
per_device_eval_batch_size: 4096
learning_rate: 0.2
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 4096
per_device_eval_batch_size: 4096
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.2
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	mmarco 3hn loss	mmarco 2hn loss	mmarco 1hn loss	mmarco 0hn loss	wp-22-12-de loss	swim ir de loss	swim ir de 3hn loss	swim ir de title 3hn loss	swim ir de title loss	avemio triples loss	avemio pairs 3hn loss	avemio pairs 0hn loss	nq german en de a 3hn loss	nq german en de 3hn loss	nq german 3hn loss	nq german 1hn loss	german oasst1 hn loss	germanrag short loss	slimorca dedup 3hn loss	slimorca dedup 2hn loss	slimorca dedup 1hn loss	slimorca dedup 0hn loss	german gpt4 3hn loss	german orca dpo loss	alpaca gpt4 3hn loss	alpaca gpt4 0hn loss	dolly context de 3hn loss	dolly context ende 3hn loss	dolly instructions de 3hn loss	dolly instructions de 0hn loss	dolly instructions ende 3hn loss	dolly responses de 3hn loss	dolly responses de 0hn loss	dolly responses ende 3hn loss	saf legal de loss	gls 3hn loss	gls 2hn loss	gls 1hn loss	gls 0hn loss	europarl 3hn loss	europarl 0hn loss	tatoeba 3hn loss	tatoeba 0hn loss	wikimatrix 3hn loss	wikipedia abstract 3hn loss	wikipedia abstract 0hn loss	wiktionary gdg de 3hn loss	wiktionary gdg de short loss	wmt24pp loss	synthia de loss	gbp 3hn loss	gbp ende 3hn loss	stbs de 3hn loss	stbs en 3hn loss	pawsx de loss	pawsx en loss	nli anli entail 3hn loss	nli fever entail 3hn loss	nli ling entail 3hn loss	nli mnli entail 3hn loss	nli wanli entail 3hn loss	nli anli transl 3hn loss	nli fever transl 3hn loss	nli ling transl 3hn loss	nli mnli transl 3hn loss	nli wanli transl 3hn loss	jina ai 3en loss	jina ai ende loss	jina ai dede loss	polyglot de loss	polyglot en loss	tilde EESC loss	miracl de 3hn loss	miracl de 0hn loss
0.0002	1	32.2328	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1211	500	17.4935	5.9441	18.6286	15.4380	21.7452	11.5899	15.7739	2.0470	6.4545	28.7021	3.4327	3.0953	21.4473	0.6579	2.3081	8.3028	17.1118	8.6341	0.7353	2.4550	10.1110	20.2165	15.1944	11.2822	14.0772	9.3205	12.3671	3.6399	0.2185	2.5653	27.0853	2.1334	4.3423	2.3262	0.5350	20.1312	6.6543	11.5668	11.2751	15.0010	2.5165	46.8575	6.4837	17.8191	0.9617	7.7542	3.3035	17.7944	4.9850	0.5039	6.9794	0.4971	1.7211	7.5595	5.8076	2.1527	0.4983	9.9586	7.6724	4.5647	4.4193	4.3135	0.8089	2.2057	0.8494	1.5787	2.4122	9.0588	1.6716	5.7378	17.4829	17.4252	2.7128	2.3019	4.9855
0.2421	1000	9.8434	5.9548	16.1939	13.6828	19.8400	10.3624	13.5662	1.7398	4.7552	26.9780	2.7763	2.6297	19.2160	0.6367	2.2657	8.1566	15.8885	7.0793	0.7799	1.6238	9.2113	18.4966	14.8541	11.6090	17.8812	7.3860	7.7746	3.2721	0.1734	2.2635	27.3627	1.7248	4.0169	3.3867	0.4930	19.1067	7.0229	13.2283	13.9238	17.1221	2.0835	47.1417	6.2599	15.3082	0.7972	6.9853	2.7917	15.1196	4.4008	0.1748	6.5392	0.4433	1.3500	7.5248	5.8447	2.1663	0.4949	9.3473	6.2105	4.2394	4.1746	4.2383	0.6806	2.1903	0.6338	1.3037	2.0331	5.0726	1.0650	5.8712	17.3595	16.0869	1.9498	2.1635	4.1986
0.3632	1500	9.4195	6.0462	15.3733	13.4579	19.1822	10.1358	13.7938	2.0818	4.3716	26.0843	2.7380	2.6063	18.9278	0.6317	2.1179	8.5954	15.0949	6.2069	0.8866	1.5936	9.0869	18.6605	14.5752	12.3640	15.1111	7.5786	8.6830	2.9134	0.1539	2.3901	24.0635	1.5851	3.0859	2.8681	0.4823	20.1934	6.9440	11.9040	11.6429	13.5179	1.9956	46.0385	6.0581	15.7130	0.7430	6.2928	2.9993	14.2742	4.1868	0.1639	5.8340	0.4744	1.3372	7.7122	5.6745	2.1703	0.4930	9.6020	6.0473	3.5016	3.7158	4.2441	0.5784	2.1883	0.5912	1.2164	1.9767	7.0197	1.0216	4.4556	14.8992	15.8563	1.8581	2.1515	4.4043
0.4843	2000	8.2114	5.8039	14.9131	12.9781	18.3934	9.9055	13.5402	2.0944	4.4961	26.2583	2.6002	2.5542	18.3124	0.5504	1.7278	8.4266	12.8837	5.5970	0.7967	1.5002	8.8843	18.2636	15.5366	12.1376	13.7508	6.1530	6.6779	2.2906	0.1435	1.8996	21.9520	1.5331	2.7177	3.0663	0.4214	19.7372	6.1346	10.9578	10.5089	13.6577	1.8838	46.2217	4.1247	12.9807	0.6397	6.3777	2.5970	13.7871	4.1784	0.1893	4.4490	0.4018	1.1374	7.1980	5.6566	2.1517	0.4921	9.2049	6.0599	3.4091	3.6662	4.0776	0.4841	2.0716	0.4860	0.9970	1.7709	7.5693	0.6321	4.9397	14.5334	15.4385	1.7821	1.9614	4.2582
0.6053	2500	8.038	5.5500	14.8000	12.8634	18.2342	9.7964	13.2195	1.9088	4.2172	25.7571	2.4768	2.4510	17.9053	0.4689	1.8237	8.1981	12.5957	6.0768	0.6939	1.5240	9.6936	18.5641	16.5833	12.5368	13.6839	6.6175	7.2916	2.3097	0.1377	1.9064	22.0331	1.5278	2.5185	4.8549	0.3997	20.1505	6.0001	10.3536	9.9127	12.7608	1.7728	46.1264	3.4876	13.2839	0.6246	6.0571	2.5264	13.6899	4.1796	0.1133	5.5862	0.3973	1.1315	7.0625	5.7281	2.1597	0.4939	9.3306	5.8505	3.0920	3.6364	4.2557	0.4513	1.9419	0.4341	0.7909	1.6440	7.5517	0.6997	4.9564	14.5145	15.7047	1.6838	1.9027	4.2791
0.7264	3000	8.4735	5.4690	14.0184	12.4418	17.2256	9.5584	12.8587	1.8026	4.2292	25.0699	2.4180	2.3386	17.5121	0.4924	1.7512	8.6264	12.9932	5.7242	0.7519	1.4209	8.7996	17.9024	15.0738	10.3888	12.8886	6.9268	7.5737	2.4082	0.1446	1.9202	22.0949	1.4499	2.7943	3.8219	0.4096	20.1391	5.9977	10.2577	9.9893	12.8969	1.8217	45.9583	3.6835	14.0661	0.6401	5.8992	2.4225	13.6148	4.0275	0.1058	4.2324	0.4046	1.1448	7.2012	5.7275	2.1669	0.4947	8.9883	5.8919	3.4086	3.5578	3.8109	0.4713	2.0382	0.4806	0.9071	1.7479	7.4633	0.6957	5.1938	14.2104	15.6664	1.7301	1.9228	4.1841
0.8475	3500	7.7352	5.3754	14.0426	12.5198	17.3227	9.4857	12.9446	1.8784	4.2447	25.1068	2.3991	2.3495	17.5300	0.4642	1.6235	8.4671	12.8252	5.3035	0.7126	1.4499	8.4552	16.9827	14.6279	10.8074	12.8392	6.5745	7.2679	2.4318	0.1319	1.8556	22.2088	1.3227	2.6365	4.3796	0.3783	20.1810	5.9464	10.2856	9.9382	12.6812	1.6933	46.2977	3.6286	13.8749	0.5844	5.8990	2.4661	13.3314	4.0382	0.1148	4.3655	0.4017	1.0360	7.1329	5.7121	2.1640	0.4945	8.9242	5.6470	3.2758	3.5739	4.0207	0.4303	1.9566	0.4515	0.8112	1.6914	7.4063	0.6659	5.2429	13.9946	15.6856	1.5650	1.8613	4.3350
0.9685	4000	7.4739	5.3820	13.9713	12.4551	17.2949	9.4687	12.9339	1.9303	4.2006	25.0763	2.3880	2.3362	17.4705	0.4638	1.6235	8.3594	12.6393	5.3609	0.7168	1.4452	8.3913	16.8145	14.9649	10.7862	12.5774	6.6076	7.1481	2.3770	0.1320	1.8618	22.2842	1.3191	2.6045	4.6015	0.3718	14.6598	5.9303	10.1947	9.8502	12.5003	1.6814	46.1385	3.6696	13.8947	0.5799	5.8546	2.4445	13.3022	4.0359	0.1090	4.4493	0.3932	1.0395	7.1369	5.6920	2.1641	0.4943	8.9089	5.6356	3.2438	3.5664	4.0016	0.4297	1.9810	0.4511	0.8123	1.6705	7.4795	0.6834	5.2668	13.9481	15.6508	1.5442	1.8556	4.3036

Framework Versions

Python: 3.10.15
Sentence Transformers: 5.0.0
Transformers: 4.51.3
PyTorch: 2.1.0+cu121
Accelerate: 1.3.0
Datasets: 2.21.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

GermanGovServiceRetrieval

@software{lhm-dienstleistungen-qa,
  author = {Schröder, Leon Marius and
Gutknecht, Clemens and
Alkiddeh, Oubada and
Susanne Weiß,
Lukas, Leon},
  month = nov,
  publisher = {it@M},
  title = {LHM-Dienstleistungen-QA - german public domain question-answering dataset},
  url = {https://huggingface.co/datasets/it-at-m/LHM-Dienstleistungen-QA},
  year = {2022},
}

MMTEB

@article{enevoldsen2025mmtebmassivemultilingualtext,
  title={MMTEB: Massive Multilingual Text Embedding Benchmark},
  author={Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  publisher = {arXiv},
  journal={arXiv preprint arXiv:2502.13595},
  year={2025},
  url={https://arxiv.org/abs/2502.13595},
  doi = {10.48550/arXiv.2502.13595},
}

MTEB

@article{muennighoff2022mteb,
  author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo{\"\i}c and Reimers, Nils},
  title = {MTEB: Massive Text Embedding Benchmark},
  publisher = {arXiv},
  journal={arXiv preprint arXiv:2210.07316},
  year = {2022}
  url = {https://arxiv.org/abs/2210.07316},
  doi = {10.48550/ARXIV.2210.07316},
}

MarcGrumpyOlejak
/

sts-mrl-en-de-base-v1