Redis fine-tuned CrossEncoder model for semantic caching on LangCache

This is a Cross Encoder model finetuned from Alibaba-NLP/gte-reranker-modernbert-base on the LangCache Sentence Pairs (all) dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for sentence pair classification.

Model Details

Model Description

Model Type: Cross Encoder
Base model: Alibaba-NLP/gte-reranker-modernbert-base
Maximum Sequence Length: 8192 tokens
Number of Output Labels: 1 label
Training Dataset:
- LangCache Sentence Pairs (all)
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("aditeyabaral-redis/langcache-reranker-v1-wdwr")
# Get scores for pairs of texts
pairs = [
    ["He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .", '" The foodservice pie business does not fit our long-term growth strategy .'],
    ['Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .', 'His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .'],
    ['The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .', 'The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .'],
    ['The AFL-CIO is waiting until October to decide if it will endorse a candidate .', 'The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .'],
    ['No dates have been set for the civil or the criminal trial .', 'No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    "He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .",
    [
        '" The foodservice pie business does not fit our long-term growth strategy .',
        'His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .',
        'The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .',
        'The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .',
        'No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Classification

Datasets: val and test
Evaluated with CrossEncoderClassificationEvaluator

Metric	val	test
accuracy	0.7731	0.723
accuracy_threshold	0.7637	0.9352
f1	0.6951	0.7144
f1_threshold	0.0464	0.9143
precision	0.6455	0.6303
recall	0.7529	0.8245
average_precision	0.7833	0.6907

Training Details

Training Dataset

LangCache Sentence Pairs (all)

Dataset: LangCache Sentence Pairs (all)
Size: 8,405 training samples
Columns: sentence1, sentence2, and label

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label
type	string	string	int
details	min: 28 characters mean: 116.35 characters max: 227 characters	min: 15 characters mean: 113.13 characters max: 243 characters	0: ~45.80% 1: ~54.20%

Samples:

sentence1	sentence2	label
`He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .`	`" The foodservice pie business does not fit our long-term growth strategy .`	`1`
`Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .`	`His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .`	`0`
`The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .`	`The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .`	`0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Evaluation Dataset

LangCache Sentence Pairs (all)

Dataset: LangCache Sentence Pairs (all)
Size: 8,405 evaluation samples
Columns: sentence1, sentence2, and label

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label
type	string	string	int
details	min: 28 characters mean: 116.35 characters max: 227 characters	min: 15 characters mean: 113.13 characters max: 243 characters	0: ~45.80% 1: ~54.20%

Samples:

sentence1	sentence2	label
`He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .`	`" The foodservice pie business does not fit our long-term growth strategy .`	`1`
`Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .`	`His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .`	`0`
`The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .`	`The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .`	`0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 48
per_device_eval_batch_size: 48
learning_rate: 0.0002
weight_decay: 0.01
num_train_epochs: 20
warmup_ratio: 0.1
load_best_model_at_end: True
optim: adamw_torch
push_to_hub: True
hub_model_id: aditeyabaral-redis/langcache-reranker-v1-wdwr

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 48
per_device_eval_batch_size: 48
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0002
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: True
resume_from_checkpoint: None
hub_model_id: aditeyabaral-redis/langcache-reranker-v1-wdwr
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	val_average_precision	test_average_precision
-1	-1	-	-	0.7676	0.6907
0.1833	1000	0.3563	0.4805	0.7831	-
0.3666	2000	0.2065	0.5394	0.8221	-
0.5499	3000	0.1983	0.5019	0.8178	-
0.7331	4000	0.1923	0.5109	0.7960	-
0.9164	5000	0.1886	0.4726	0.8058	-
1.0997	6000	0.183	0.5062	0.8032	-
1.2830	7000	0.1838	0.5152	0.8021	-
1.4663	8000	0.1858	0.5105	0.7926	-
1.6496	9000	0.1905	0.5052	0.7859	-
1.8328	10000	0.1926	0.5316	0.7895	-
2.0161	11000	0.1951	0.5340	0.7681	-
2.1994	12000	0.1853	0.5573	0.7577	-
2.3827	13000	0.1848	0.5530	0.7946	-
2.5660	14000	0.1813	0.5754	0.7655	-
2.7493	15000	0.1793	0.5316	0.7514	-
2.9326	16000	0.1778	0.5230	0.7868	-
3.1158	17000	0.1681	0.5246	0.7816	-
3.2991	18000	0.1662	0.4946	0.7732	-
3.4824	19000	0.1648	0.5262	0.7853	-
3.6657	20000	0.1649	0.5007	0.7871	-
3.8490	21000	0.1633	0.5368	0.7807	-
4.0323	22000	0.1602	0.5559	0.7769	-
4.2155	23000	0.149	0.5796	0.7697	-
4.3988	24000	0.1486	0.5322	0.7608	-
4.5821	25000	0.1495	0.5142	0.7713	-
4.7654	26000	0.1493	0.5203	0.7866	-
4.9487	27000	0.1498	0.5433	0.7738	-
5.1320	28000	0.1391	0.5589	0.7803	-
5.3152	29000	0.1346	0.5267	0.7713	-
5.4985	30000	0.1367	0.5657	0.7803	-
5.6818	31000	0.1358	0.5631	0.7646	-
5.8651	32000	0.136	0.5444	0.7753	-
6.0484	33000	0.1346	0.5605	0.7703	-
6.2317	34000	0.1222	0.5399	0.7776	-
6.4150	35000	0.1241	0.5272	0.7899	-
6.5982	36000	0.1243	0.6096	0.7723	-
6.7815	37000	0.1266	0.5661	0.7609	-
6.9648	38000	0.1246	0.5341	0.7889	-
7.1481	39000	0.1128	0.6223	0.7884	-
7.3314	40000	0.1124	0.5485	0.7743	-
7.5147	41000	0.1127	0.5375	0.7842	-
7.6979	42000	0.1122	0.5231	0.7939	-
7.8812	43000	0.1141	0.5608	0.7705	-
8.0645	44000	0.1088	0.6511	0.7813	-
8.2478	45000	0.0998	0.6217	0.7648	-
8.4311	46000	0.1017	0.6000	0.7822	-
8.6144	47000	0.1031	0.5469	0.7866	-
8.7977	48000	0.1012	0.5862	0.7790	-
8.9809	49000	0.1031	0.5527	0.7876	-
9.1642	50000	0.0921	0.5460	0.7788	-
9.3475	51000	0.0909	0.5820	0.7815	-
9.5308	52000	0.0919	0.5589	0.7841	-
9.7141	53000	0.0939	0.5521	0.7821	-
9.8974	54000	0.0925	0.6942	0.7797	-
10.0806	55000	0.0863	0.6208	0.7729	-
10.2639	56000	0.0803	0.6632	0.7911	-
10.4472	57000	0.0797	0.6583	0.7833	-
10.6305	58000	0.0824	0.6194	0.7862	-
10.8138	59000	0.0829	0.6136	0.7783	-
10.9971	60000	0.0819	0.5833	0.7727	-
11.1804	61000	0.0693	0.6491	0.7881	-
11.3636	62000	0.0709	0.6449	0.7784	-
11.5469	63000	0.0721	0.6158	0.7838	-
11.7302	64000	0.0721	0.6649	0.7841	-
11.9135	65000	0.0732	0.6403	0.7702	-
12.0968	66000	0.0679	0.6079	0.7817	-
12.2801	67000	0.0615	0.6862	0.7787	-
12.4633	68000	0.0629	0.7239	0.7824	-
12.6466	69000	0.0643	0.6419	0.7897	-
12.8299	70000	0.0635	0.6743	0.7762	-
13.0132	71000	0.064	0.7135	0.7741	-
13.1965	72000	0.0545	0.6643	0.7723	-
13.3798	73000	0.0548	0.6508	0.7758	-
13.5630	74000	0.0547	0.7003	0.7785	-
13.7463	75000	0.0548	0.7170	0.7846	-
13.9296	76000	0.0553	0.6917	0.7722	-
14.1129	77000	0.0508	0.7000	0.7767	-
14.2962	78000	0.0474	0.7336	0.7730	-
14.4795	79000	0.0465	0.7122	0.7795	-
14.6628	80000	0.0478	0.7321	0.7779	-
14.8460	81000	0.0468	0.7112	0.7796	-
15.0293	82000	0.0465	0.7534	0.7788	-
15.2126	83000	0.0395	0.7238	0.7808	-
15.3959	84000	0.0401	0.7686	0.7905	-
15.5792	85000	0.0408	0.7296	0.7900	-
15.7625	86000	0.0414	0.7533	0.7822	-
15.9457	87000	0.0402	0.7748	0.7867	-
16.1290	88000	0.0352	0.8267	0.7844	-
16.3123	89000	0.0354	0.7488	0.7912	-
16.4956	90000	0.0337	0.7850	0.7857	-
16.6789	91000	0.0333	0.7812	0.7815	-
16.8622	92000	0.0341	0.8184	0.7786	-
17.0455	93000	0.0333	0.8166	0.7781	-
17.2287	94000	0.0288	0.7980	0.7803	-
17.4120	95000	0.0282	0.8195	0.7774	-
17.5953	96000	0.0285	0.7864	0.7829	-
17.7786	97000	0.0284	0.8000	0.7838	-
17.9619	98000	0.0279	0.8118	0.7873	-
18.1452	99000	0.0245	0.8727	0.7866	-
18.3284	100000	0.0235	0.8695	0.7836	-
18.5117	101000	0.0236	0.8246	0.7820	-
18.6950	102000	0.0232	0.8543	0.7828	-
18.8783	103000	0.0234	0.8840	0.7793	-
19.0616	104000	0.0219	0.8804	0.7783	-
19.2449	105000	0.0201	0.8885	0.7812	-
19.4282	106000	0.0194	0.8901	0.7821	-
19.6114	107000	0.0197	0.8850	0.7824	-
19.7947	108000	0.0196	0.8835	0.7830	-
19.9780	109000	0.0197	0.8803	0.7833	-

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.3
Sentence Transformers: 5.1.0
Transformers: 4.55.0
PyTorch: 2.8.0+cu128
Accelerate: 1.10.0
Datasets: 4.0.0
Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}