ModernBERT-base trained on Chemistry
This is a Cross Encoder model finetuned from GaborMadarasz/ModernBERT-base-hungarian using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: GaborMadarasz/ModernBERT-base-hungarian
- Maximum Sequence Length: 8192 tokens
- Number of Output Labels: 1 label
- Language: hu
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the ๐ค Hub
model = CrossEncoder("GaborMadarasz/reranker-ModernBERT-base-hungarian")
# Get scores for pairs of texts
pairs = [
['Milyen halmazรกllapotรบ a klรณr szobahลmรฉrsรฉkleten?', 'Gรกz'],
['Milyen halmazรกllapotรบ a klรณr szobahลmรฉrsรฉkleten?', 'Gรกz.'],
['Mi az izomรฉria fogalma?', 'Azonos รถsszegkรฉpletลฑ, de eltรฉrล szerkezetลฑ รฉs tulajdonsรกgรบ anyagok. '],
['Melyik elektronhรฉjon talรกlhatรณ a hidrogรฉnatom egyetlen elektronja?', 'Az elsล hรฉjon.'],
['Milyen felhasznรกlรกsi terรผletei vannak a szilรญciumnak?', 'รtvรถzลelemkรฉnt, tranzisztorok, integrรกlt รกramkรถrรถk, fรฉnyelemek elลรกllรญtรกsรกra.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'Milyen halmazรกllapotรบ a klรณr szobahลmรฉrsรฉkleten?',
[
'Gรกz',
'Gรกz.',
'Azonos รถsszegkรฉpletลฑ, de eltรฉrล szerkezetลฑ รฉs tulajdonsรกgรบ anyagok. ',
'Az elsล hรฉjon.',
'รtvรถzลelemkรฉnt, tranzisztorok, integrรกlt รกramkรถrรถk, fรฉnyelemek elลรกllรญtรกsรกra.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Dataset:
chem-dev
- Evaluated with
CrossEncoderRerankingEvaluator
with these parameters:{ "at_k": 10, "always_rerank_positives": false }
Metric | Value |
---|---|
map | 0.4646 (+0.0929) |
mrr@10 | 0.4614 (+0.0966) |
ndcg@10 | 0.4928 (+0.0910) |
Training Details
Training Dataset
Unnamed Dataset
- Size: 32,113 training samples
- Columns:
query
,answer
, andlabel
- Approximate statistics based on the first 1000 samples:
query answer label type string string int details - min: 8 characters
- mean: 52.3 characters
- max: 159 characters
- min: 1 characters
- mean: 83.87 characters
- max: 531 characters
- 0: ~69.80%
- 1: ~30.20%
- Samples:
query answer label Milyen halmazรกllapotรบ a klรณr szobahลmรฉrsรฉkleten?
Gรกz
1
Milyen halmazรกllapotรบ a klรณr szobahลmรฉrsรฉkleten?
Gรกz.
1
Mi az izomรฉria fogalma?
Azonos รถsszegkรฉpletลฑ, de eltรฉrล szerkezetลฑ รฉs tulajdonsรกgรบ anyagok.
1
- Loss:
BinaryCrossEntropyLoss
with these parameters:{ "activation_fn": "torch.nn.modules.linear.Identity", "pos_weight": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 2per_device_eval_batch_size
: 2gradient_accumulation_steps
: 8learning_rate
: 2e-05warmup_ratio
: 0.1seed
: 12dataloader_num_workers
: 2load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 2per_device_eval_batch_size
: 2per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 8eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 12data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 2dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | chem-dev_ndcg@10 |
---|---|---|---|
-1 | -1 | - | 0.1188 (-0.2831) |
0.0005 | 1 | 1.9222 | - |
0.0498 | 100 | 1.8084 | - |
0.0996 | 200 | 1.2947 | 0.2862 (-0.1157) |
0.1495 | 300 | 1.1573 | - |
0.1993 | 400 | 1.17 | 0.3567 (-0.0452) |
0.2491 | 500 | 1.0609 | - |
0.2989 | 600 | 1.01 | 0.3747 (-0.0272) |
0.3488 | 700 | 0.9806 | - |
0.3986 | 800 | 0.9208 | 0.3963 (-0.0056) |
0.4484 | 900 | 0.9022 | - |
0.4982 | 1000 | 0.8722 | 0.4106 (+0.0087) |
0.5480 | 1100 | 0.9325 | - |
0.5979 | 1200 | 0.768 | 0.4316 (+0.0298) |
0.6477 | 1300 | 0.8151 | - |
0.6975 | 1400 | 0.7569 | 0.4506 (+0.0487) |
0.7473 | 1500 | 0.7216 | - |
0.7972 | 1600 | 0.7571 | 0.4643 (+0.0625) |
0.8470 | 1700 | 0.6993 | - |
0.8968 | 1800 | 0.6709 | 0.4713 (+0.0694) |
0.9466 | 1900 | 0.7021 | - |
0.9965 | 2000 | 0.7693 | 0.4805 (+0.0787) |
1.0458 | 2100 | 0.5179 | - |
1.0957 | 2200 | 0.4932 | 0.4800 (+0.0781) |
1.1455 | 2300 | 0.5568 | - |
1.1953 | 2400 | 0.4191 | 0.4821 (+0.0803) |
1.2451 | 2500 | 0.4702 | - |
1.2949 | 2600 | 0.4126 | 0.4851 (+0.0833) |
1.3448 | 2700 | 0.4744 | - |
1.3946 | 2800 | 0.4404 | 0.4907 (+0.0888) |
1.4444 | 2900 | 0.4712 | - |
1.4942 | 3000 | 0.4382 | 0.4913 (+0.0894) |
1.5441 | 3100 | 0.5049 | - |
1.5939 | 3200 | 0.4714 | 0.4886 (+0.0868) |
1.6437 | 3300 | 0.3885 | - |
1.6935 | 3400 | 0.4361 | 0.4924 (+0.0906) |
1.7434 | 3500 | 0.4207 | - |
1.7932 | 3600 | 0.4384 | 0.4928 (+0.0910) |
1.8430 | 3700 | 0.4187 | - |
1.8928 | 3800 | 0.4271 | 0.4937 (+0.0919) |
1.9426 | 3900 | 0.3581 | - |
1.9925 | 4000 | 0.3751 | 0.4910 (+0.0891) |
2.0419 | 4100 | 0.2494 | - |
2.0917 | 4200 | 0.2045 | 0.4869 (+0.0850) |
2.1415 | 4300 | 0.1532 | - |
2.1913 | 4400 | 0.1268 | 0.4838 (+0.0820) |
2.2411 | 4500 | 0.2108 | - |
2.2910 | 4600 | 0.2292 | 0.4889 (+0.0870) |
2.3408 | 4700 | 0.2154 | - |
2.3906 | 4800 | 0.1574 | 0.4921 (+0.0902) |
2.4404 | 4900 | 0.1677 | - |
2.4903 | 5000 | 0.1596 | 0.4826 (+0.0807) |
2.5401 | 5100 | 0.1456 | - |
2.5899 | 5200 | 0.2177 | 0.4867 (+0.0849) |
2.6397 | 5300 | 0.1227 | - |
2.6895 | 5400 | 0.1638 | 0.4880 (+0.0862) |
2.7394 | 5500 | 0.1192 | - |
2.7892 | 5600 | 0.2003 | 0.4848 (+0.0829) |
2.8390 | 5700 | 0.2717 | - |
2.8888 | 5800 | 0.1546 | 0.4841 (+0.0822) |
2.9387 | 5900 | 0.268 | - |
2.9885 | 6000 | 0.2253 | 0.4858 (+0.0840) |
-1 | -1 | - | 0.4928 (+0.0910) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 5.0.0
- Transformers: 4.53.2
- PyTorch: 2.7.0+cpu
- Accelerate: 1.6.0
- Datasets: 3.2.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for GaborMadarasz/reranker-ModernBERT-base-Chem-hungarian
Base model
answerdotai/ModernBERT-base
Finetuned
GaborMadarasz/ModernBERT-base-hungarian
Evaluation results
- Map on chem devself-reported0.465
- Mrr@10 on chem devself-reported0.461
- Ndcg@10 on chem devself-reported0.493