ModernBERT Embed base LegalTextAI Matryoshka legaldataset

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: nomic-ai/modernbert-embed-base
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- json
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("legaltextai/modernbert-embed-base-legaltextai-matryoshka-legaldataset")
# Run inference
sentences = [
    'In the context of supporting factual positions in a legal motion, what are the two primary ways a party can assert that a fact cannot be genuinely disputed according to the procedures outlined in section (c)(1)?',
    '(c) Procedures.\n\n(1) Supporting Factual Positions. A party asserting that a fact cannot be or is\xa0genuinely disputed must support the assertion by:\n\n(A) citing to particular parts of materials in the record, including depositions,\xa0documents, electronically stored information, affidavits or declarations,\xa0stipulations (including those made for purposes of the motion only), admissions,\xa0interrogatory answers, or other materials; or\n\n(B) showing that the materials cited do not establish the absence or presence of a\xa0genuine dispute, or that an adverse party cannot produce admissible evidence to\xa0support the fact.\n\n(2) Objection That a Fact Is Not Supported by Admissible Evidence. A party may\xa0object that the material cited to support or dispute a fact cannot be presented in a\xa0form that would be admissible in evidence.\n\n(3) Materials Not Cited. The court need consider only the cited materials, but it\xa0may consider other materials in the record.\n\n(4) Affidavits or Declarations. An affidavit or declaration used to support or oppose\xa0a motion must be made on personal knowledge, set out facts that would be admissible\xa0in evidence, and show that the affiant or declarant is competent to testify on the\xa0matters stated.\n\n(d) When Facts are Unavailable to the Nonmovant. If a nonmovant shows by\xa0affidavit or declaration that, for specified reasons, it cannot present facts essential to\xa0justify its opposition, the court may:\n\n(1) defer considering the motion or deny it;\n\n(2) allow time to obtain affidavits or declarations or to take discovery; or\n\n(3) issue any other appropriate order.\n\n(e) Failing to Properly Support or Address a Fact. If a party fails to properly support an assertion of fact or fails to properly address another party’s assertion of fact as required by Rule 56(c), the court may:\n\n(1) give an opportunity to properly support or address the fact;\n\n(2) consider the fact undisputed for purposes of the motion;\n\n(3) grant summary judgment if the motion and supporting materials — including\xa0the facts considered undisputed — show that the movant is entitled to it; or\n\n(4) issue any other appropriate order.\n\n(f) Judgment Independent of the Motion. After giving notice and a reasonable time to respond, the court may:\n\n(1) grant summary judgment for a nonmovant;\n\n(2) grant the motion on grounds not raised by a party; or\n\n(3) consider summary judgment on its own after identifying for the parties material\xa0facts that may not be genuinely in dispute.\n\n(g) Failing to Grant All the Requested Relief.\xa0If the court does not grant all the relief requested by the motion, it may enter an order stating any material fact — including an\xa0item of damages or other relief — that is not genuinely in dispute and treating the fact as\xa0established in the case.\n\n(h) Affidavit or Declaration Submitted in Bad Faith.\xa0If satisfied that an affidavit or declaration under this rule is submitted in bad faith or solely for delay, the court — after\xa0notice and a reasonable time to respond — may order the submitting party to pay the\xa0other party the reasonable expenses, including attorney’s fees, it incurred as a result. An\xa0offending party or attorney may also be held in contempt or subjected to other appropriate sanctions.\n\n\xa0\n\n\xa0\n\n\xa0\n\n11.1.3\n\nAdickes v. S.H. Kress & Co.\n\n\xa0\n\nSupreme Court of the United States\n\n398 U.S. 144, 26 L. Ed. 2d 142, 90 S. Ct. 1598, 1970 U.S. LEXIS 31, SCDB 1969-101\n\nNo. 79\n\n1970-06-01\n\n[ … ]\n\nCERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE SECOND CIRCUIT.\n\n[ … ]\n\nMR. JUSTICE HARLAN delivered the opinion of the Court.\n\nPetitioner, Sandra Adickes, a white school teacher from New York, brought this suit in the United States District Court for the Southern District of New York against respondent S. H. Kress & Co. ("Kress") to recover damages under 42 U. S. C. § 1983[1] for an alleged violation of her constitutional rights under the Equal Protection Clause of the Fourteenth Amendment.',
    "I will not be requiring you to read these materials. Nor will you be tested on them. After discussions with a number of colleagues, I decided that I will present an optional lecture or two on sexual assault.\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.1\n\nIntroduction\n\n\xa0\n\nTo a greater degree than any of the other crimes we study in this class, the very definition of rape has been a subject of dispute and reform in recent years. Perhaps that is because the basic result element that rape law criminalizes—sexual intercourse—is not, unlike death or battery, itself considered bad. When someone intentionally kills another, there is usually little question (except in cases of self-defense) that the result is bad and that a crime may have occurred. Unlike most intentional killing, intentional sex is not inherently wrong. Indeed, in some situations, much of the evidence of rape may rest in the perceptions and interpretations of the involved parties. \n\nThe traditional elements of rape law are: 1) sexual intercourse; 2) with force; 3) and lack of consent. Because the sexual intercourse element of rape can be difficult to distinguish from lawful, intentional behavior, rape law has struggled to create a regime that balances the punishment of wrongdoers with the protection of the rights of the accused. Originally, rape law established strict rules governing punishable behavior that were under-inclusive and strongly protected accused men: for example, a claim of rape had to include the use of physical force by the accused and physical resistance by the victim. Additionally, there was a spousal exception to rape, so that husbands could not be criminally liable for rape of their wives. \n\nAs the cases in this section demonstrate, however, rape law reform in the past several decades has dramatically affected these requirements. Namely, feminist legal reformers have challenged and in many jurisdictions weakened or eliminated the force requirement. That has shifted more legal focus onto the question of whether there was consent. Consider what problems consent itself may have as a central element of rape law. As you read the cases and essays in this section, consider how different formulations of rape law balance several very serious considerations of our criminal system: punishing wrongdoers; differentiating between levels of blameworthiness; and protecting the rights of defendants. What evidentiary or normative roles did the traditional rape requirements play? What are the risks of limiting or removing them? How should our system balance the risks of over-inclusivity and under-inclusivity? What social and intimate relationships between men and women do the various possible rape rules promote and change? And as always, how do these questions implicate the justifications of punishment such as retribution and deterrence?\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.1.1\n\nExcerpt from Criminal Law: Cases, Controversies and Problems (West Academic Publishing 2019) by Joseph E. Kennedy (used with permission).\n\n\xa0\n\nhttps://app.box.com/s/ixs8jw1d0oi45q68xvpk3vl69m2p6y71\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.2\n\nStatutes\n\n\xa0\n\nConsider some of these questions while you are reviewing these statutes.\n\nHow do the statutes define sex, if at all? \n\nHow do they define force, if at all? \n\nWhat is the mens rea required? \n\nHow do you think they balance the rights of the accused with the harm to be avoided? \n\nAs a defense attorney, which one would you find most defendant-friendly? \n\nAs a prosecutor, which one would you find most prosecution-friendly?\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.2.1\n\nForce v. Non-Consent: An Ongoing Struggle to Define Rape\n\n\xa0\n\nAfter reading the passage from Rusk v. State, below, compare and contrast the MPC's section from 1962 with the proposed section governing sexual assault.\n\n\xa0\n\nPassages taken from the Dissent of\xa0Rusk v. State, 43 Md. App. 476, 406 A.2d 624 (1979),\xa0rev'd,\xa0289 Md. 230, 424 A.2d 720 (1981)):\n\nUnfortunately, courts,[ … ] often tend to confuse these two elements force and lack of consent and to think of them as one. They are not. They mean, and require, different things. [ … ]What seems to cause the confusion what, indeed, has become a common denominator of both elements is the notion that the victim must actively resist the attack upon her.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: dim_768, dim_512, dim_256, dim_128 and dim_64
Evaluated with InformationRetrievalEvaluator

Metric	dim_768	dim_512	dim_256	dim_128	dim_64
cosine_accuracy@1	0.5637	0.5644	0.5514	0.5158	0.4461
cosine_accuracy@3	0.7532	0.748	0.7351	0.6869	0.6086
cosine_accuracy@5	0.8338	0.8327	0.8229	0.7782	0.6926
cosine_accuracy@10	0.9065	0.9069	0.8994	0.8681	0.7905
cosine_precision@1	0.5637	0.5644	0.5514	0.5158	0.4461
cosine_precision@3	0.4369	0.4347	0.4263	0.3982	0.3508
cosine_precision@5	0.3139	0.313	0.3084	0.2902	0.2581
cosine_precision@10	0.1772	0.1773	0.1754	0.1692	0.1544
cosine_recall@1	0.1728	0.1731	0.1691	0.158	0.1369
cosine_recall@3	0.3954	0.3937	0.3853	0.3604	0.3186
cosine_recall@5	0.4741	0.4728	0.4654	0.4383	0.3895
cosine_recall@10	0.5349	0.5347	0.5289	0.5108	0.4661
cosine_ndcg@10	0.5188	0.5183	0.51	0.4848	0.4332
cosine_mrr@10	0.6564	0.6556	0.6436	0.606	0.5331
cosine_map@100	0.4055	0.4051	0.3979	0.3769	0.3365

Training Details

Training Dataset

json

Dataset: json
Size: 41,342 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 23 tokens
mean: 43.27 tokens
max: 72 tokens

min: 279 tokens
mean: 960.03 tokens
max: 1076 tokens

	anchor	positive
type	string	string
details	min: 23 tokens mean: 43.27 tokens max: 72 tokens	min: 279 tokens mean: 960.03 tokens max: 1076 tokens

Samples:

anchor	positive
`What reasons did the District provide for placing Mr. Kennedy on paid administrative leave after the October 26 game, and how did they justify their concerns regarding his postgame prayers?`	The letter also admitted that, during Mr. Kennedy’s recent October 16 postgame prayer, his students were otherwise engaged and not praying with him, and that his prayer was “fleeting.” Id., at 90, 93. Still, the District explained that a “reasonable observer” could think government endorsement of religion had occurred when a “District employee, on the field only by virtue of his employment with the District, still on duty” engaged in “overtly religious conduct.” Id., at 91, 93. The District thus made clear that the only option it would offer Mr. Kennedy was to allow him to pray after a game in a “private location” behind closed doors and “not observable to students or the public.” Id., at 93–94. After the October 23 game ended, Mr. Kennedy knelt at the 50-yard line, where “no one joined him,” and bowed his head for a “brief, quiet prayer.” 991 F.3d at 1019; App. 173, 236–239. The superintendent informed the District’s board that this prayer “moved closer to what we want,” but never...
`Why is it considered an abuse of discretion for a district court to require the S.E.C. to establish the "truth" of the allegations against a settling party as a condition for approving consent decrees?`	[ … ] We turn, then, to the far thornier question of what deference the district court owes an agency seeking a consent decree. Our Court recognizes a “strong federal policy favoring the approval and enforcement of consent decrees.” [ … ]“To be sure, when the district judge is presented with a proposed consent judgment, he is not merely a ‘rubber stamp.’ ” [ … ][ … ] [ … ] the proper standard for reviewing a proposed consent judgment involving an enforcement agency requires that the district court determine whether the proposed consent decree is fair and reasonable, with the additional requirement that the “public interest would not be disserved,” [ … ] in the event that the consent decree includes in-junctive relief. Absent a substantial basis in the record for concluding that the proposed consent decree does not meet these requirements, the district court is required to enter the order. We omit “adequacy” from the standard. Scrutinizing a proposed consent decree for “adequacy” ap...
`Describe the sequence of events that led to Officer McClendon asking Jamison for consent to search his vehicle. What were the key points of contention between Officer McClendon's and Jamison's accounts of this interaction?`	Officer McClendon pulled behind Jamison and flashed his blue lights. Jamison immediately pulled over to the right shoulder.[27] As Officer McClendon approached the passenger side of Jamison's car, Jamison rolled down the passenger side window. Officer McClendon began to speak with Jamison when he reached the window. According 393*393 to McClendon, he noticed that Jamison had recently purchased his car in Pennsylvania, and Jamison told him that he was traveling from "Vegas or Arizona." Officer McClendon asked Jamison for "his license, insurance, [and] the paperwork on the vehicle because it didn't have a tag." Jamison provided his bill of sale, insurance, and South Carolina driver's license. Officer McClendon returned to his car to conduct a background check using the El Paso Intelligence Center ("EPIC"). The EPIC check came back clear immediately. Officer McLendon then contacted the National Criminal Information Center ("NCIC") and asked the dispatcher to run a criminal history on Ja...

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 32
learning_rate: 2e-05
num_train_epochs: 4
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: True
tf32: True
load_best_model_at_end: True
optim: adamw_torch_fused
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 32
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: True
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	dim_768_cosine_ndcg@10	dim_512_cosine_ndcg@10	dim_256_cosine_ndcg@10	dim_128_cosine_ndcg@10	dim_64_cosine_ndcg@10
0.1238	10	59.6933	-	-	-	-	-
0.2477	20	20.2066	-	-	-	-	-
0.3715	30	10.2468	-	-	-	-	-
0.4954	40	7.7729	-	-	-	-	-
0.6192	50	6.5815	-	-	-	-	-
0.7430	60	5.8646	-	-	-	-	-
0.8669	70	5.0228	-	-	-	-	-
0.9907	80	4.8557	-	-	-	-	-
1.0	81	-	0.5013	0.4986	0.4888	0.4586	0.3932
1.1115	90	3.0385	-	-	-	-	-
1.2353	100	2.9601	-	-	-	-	-
1.3591	110	2.8391	-	-	-	-	-
1.4830	120	2.9631	-	-	-	-	-
1.6068	130	2.6344	-	-	-	-	-
1.7307	140	2.4715	-	-	-	-	-
1.8545	150	2.7462	-	-	-	-	-
1.9783	160	2.5805	-	-	-	-	-
2.0	162	-	0.5162	0.5142	0.5040	0.4778	0.4242
2.0991	170	2.0474	-	-	-	-	-
2.2229	180	1.9431	-	-	-	-	-
2.3467	190	2.0218	-	-	-	-	-
2.4706	200	1.8881	-	-	-	-	-
2.5944	210	1.6105	-	-	-	-	-
2.7183	220	1.9675	-	-	-	-	-
2.8421	230	1.6917	-	-	-	-	-
2.9659	240	1.8939	-	-	-	-	-
3.0	243	-	0.5188	0.5175	0.5097	0.4840	0.4303
3.0867	250	1.8625	-	-	-	-	-
3.2105	260	1.7864	-	-	-	-	-
3.3344	270	1.6404	-	-	-	-	-
3.4582	280	1.6378	-	-	-	-	-
3.5820	290	1.8484	-	-	-	-	-
3.7059	300	1.7864	-	-	-	-	-
3.8297	310	1.5436	-	-	-	-	-
3.9536	320	1.3438	0.5188	0.5183	0.51	0.4848	0.4332

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.11.11
Sentence Transformers: 3.4.1
Transformers: 4.49.0
PyTorch: 2.6.0+cu124
Accelerate: 1.3.0
Datasets: 3.3.1
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

legaltextai
/

modernbert-embed-base-legaltextai-matryoshka-legaldataset