ModernBERT Embed base LegalTextAI Matryoshka legaldataset

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("legaltextai/modernbert-embed-base-legaltextai-matryoshka-legaldataset")
# Run inference
sentences = [
    'In the context of supporting factual positions in a legal motion, what are the two primary ways a party can assert that a fact cannot be genuinely disputed according to the procedures outlined in section (c)(1)?',
    '(c) Procedures.\n\n(1) Supporting Factual Positions. A party asserting that a fact cannot be or is\xa0genuinely disputed must support the assertion by:\n\n(A) citing to particular parts of materials in the record, including depositions,\xa0documents, electronically stored information, affidavits or declarations,\xa0stipulations (including those made for purposes of the motion only), admissions,\xa0interrogatory answers, or other materials; or\n\n(B) showing that the materials cited do not establish the absence or presence of a\xa0genuine dispute, or that an adverse party cannot produce admissible evidence to\xa0support the fact.\n\n(2) Objection That a Fact Is Not Supported by Admissible Evidence. A party may\xa0object that the material cited to support or dispute a fact cannot be presented in a\xa0form that would be admissible in evidence.\n\n(3) Materials Not Cited. The court need consider only the cited materials, but it\xa0may consider other materials in the record.\n\n(4) Affidavits or Declarations. An affidavit or declaration used to support or oppose\xa0a motion must be made on personal knowledge, set out facts that would be admissible\xa0in evidence, and show that the affiant or declarant is competent to testify on the\xa0matters stated.\n\n(d) When Facts are Unavailable to the Nonmovant. If a nonmovant shows by\xa0affidavit or declaration that, for specified reasons, it cannot present facts essential to\xa0justify its opposition, the court may:\n\n(1) defer considering the motion or deny it;\n\n(2) allow time to obtain affidavits or declarations or to take discovery; or\n\n(3) issue any other appropriate order.\n\n(e) Failing to Properly Support or Address a Fact. If a party fails to properly support an assertion of fact or fails to properly address another party’s assertion of fact as required by Rule 56(c), the court may:\n\n(1) give an opportunity to properly support or address the fact;\n\n(2) consider the fact undisputed for purposes of the motion;\n\n(3) grant summary judgment if the motion and supporting materials — including\xa0the facts considered undisputed — show that the movant is entitled to it; or\n\n(4) issue any other appropriate order.\n\n(f) Judgment Independent of the Motion. After giving notice and a reasonable time to respond, the court may:\n\n(1) grant summary judgment for a nonmovant;\n\n(2) grant the motion on grounds not raised by a party; or\n\n(3) consider summary judgment on its own after identifying for the parties material\xa0facts that may not be genuinely in dispute.\n\n(g) Failing to Grant All the Requested Relief.\xa0If the court does not grant all the relief requested by the motion, it may enter an order stating any material fact — including an\xa0item of damages or other relief — that is not genuinely in dispute and treating the fact as\xa0established in the case.\n\n(h) Affidavit or Declaration Submitted in Bad Faith.\xa0If satisfied that an affidavit or declaration under this rule is submitted in bad faith or solely for delay, the court — after\xa0notice and a reasonable time to respond — may order the submitting party to pay the\xa0other party the reasonable expenses, including attorney’s fees, it incurred as a result. An\xa0offending party or attorney may also be held in contempt or subjected to other appropriate sanctions.\n\n\xa0\n\n\xa0\n\n\xa0\n\n11.1.3\n\nAdickes v. S.H. Kress & Co.\n\n\xa0\n\nSupreme Court of the United States\n\n398 U.S. 144, 26 L. Ed. 2d 142, 90 S. Ct. 1598, 1970 U.S. LEXIS 31, SCDB 1969-101\n\nNo. 79\n\n1970-06-01\n\n[ … ]\n\nCERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE SECOND CIRCUIT.\n\n[ … ]\n\nMR. JUSTICE HARLAN delivered the opinion of the Court.\n\nPetitioner, Sandra Adickes, a white school teacher from New York, brought this suit in the United States District Court for the Southern District of New York against respondent S. H. Kress & Co. ("Kress") to recover damages under 42 U. S. C. § 1983[1] for an alleged violation of her constitutional rights under the Equal Protection Clause of the Fourteenth Amendment.',
    "I will not be requiring you to read these materials. Nor will you be tested on them. After discussions with a number of colleagues, I decided that I will present an optional lecture or two on sexual assault.\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.1\n\nIntroduction\n\n\xa0\n\nTo a greater degree than any of the other crimes we study in this class, the very definition of rape has been a subject of dispute and reform in recent years. Perhaps that is because the basic result element that rape law criminalizes—sexual intercourse—is not, unlike death or battery, itself considered bad. When someone intentionally kills another, there is usually little question (except in cases of self-defense) that the result is bad and that a crime may have occurred. Unlike most intentional killing, intentional sex is not inherently wrong. Indeed, in some situations, much of the evidence of rape may rest in the perceptions and interpretations of the involved parties. \n\nThe traditional elements of rape law are: 1) sexual intercourse; 2) with force; 3) and lack of consent. Because the sexual intercourse element of rape can be difficult to distinguish from lawful, intentional behavior, rape law has struggled to create a regime that balances the punishment of wrongdoers with the protection of the rights of the accused. Originally, rape law established strict rules governing punishable behavior that were under-inclusive and strongly protected accused men: for example, a claim of rape had to include the use of physical force by the accused and physical resistance by the victim. Additionally, there was a spousal exception to rape, so that husbands could not be criminally liable for rape of their wives. \n\nAs the cases in this section demonstrate, however, rape law reform in the past several decades has dramatically affected these requirements. Namely, feminist legal reformers have challenged and in many jurisdictions weakened or eliminated the force requirement. That has shifted more legal focus onto the question of whether there was consent. Consider what problems consent itself may have as a central element of rape law. As you read the cases and essays in this section, consider how different formulations of rape law balance several very serious considerations of our criminal system: punishing wrongdoers; differentiating between levels of blameworthiness; and protecting the rights of defendants. What evidentiary or normative roles did the traditional rape requirements play? What are the risks of limiting or removing them? How should our system balance the risks of over-inclusivity and under-inclusivity? What social and intimate relationships between men and women do the various possible rape rules promote and change? And as always, how do these questions implicate the justifications of punishment such as retribution and deterrence?\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.1.1\n\nExcerpt from Criminal Law: Cases, Controversies and Problems (West Academic Publishing 2019) by Joseph E. Kennedy (used with permission).\n\n\xa0\n\nhttps://app.box.com/s/ixs8jw1d0oi45q68xvpk3vl69m2p6y71\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.2\n\nStatutes\n\n\xa0\n\nConsider some of these questions while you are reviewing these statutes.\n\nHow do the statutes define sex, if at all? \n\nHow do they define force, if at all? \n\nWhat is the mens rea required? \n\nHow do you think they balance the rights of the accused with the harm to be avoided? \n\nAs a defense attorney, which one would you find most defendant-friendly? \n\nAs a prosecutor, which one would you find most prosecution-friendly?\n\n\xa0\n\n\xa0\n\n\xa0\n\n\xa0\n\n13.2.1\n\nForce v. Non-Consent: An Ongoing Struggle to Define Rape\n\n\xa0\n\nAfter reading the passage from Rusk v. State, below, compare and contrast the MPC's section from 1962 with the proposed section governing sexual assault.\n\n\xa0\n\nPassages taken from the Dissent of\xa0Rusk v. State, 43 Md. App. 476, 406 A.2d 624 (1979),\xa0rev'd,\xa0289 Md. 230, 424 A.2d 720 (1981)):\n\nUnfortunately, courts,[ … ] often tend to confuse these two elements force and lack of consent and to think of them as one. They are not. They mean, and require, different things. [ … ]What seems to cause the confusion what, indeed, has become a common denominator of both elements is the notion that the victim must actively resist the attack upon her.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.5637 0.5644 0.5514 0.5158 0.4461
cosine_accuracy@3 0.7532 0.748 0.7351 0.6869 0.6086
cosine_accuracy@5 0.8338 0.8327 0.8229 0.7782 0.6926
cosine_accuracy@10 0.9065 0.9069 0.8994 0.8681 0.7905
cosine_precision@1 0.5637 0.5644 0.5514 0.5158 0.4461
cosine_precision@3 0.4369 0.4347 0.4263 0.3982 0.3508
cosine_precision@5 0.3139 0.313 0.3084 0.2902 0.2581
cosine_precision@10 0.1772 0.1773 0.1754 0.1692 0.1544
cosine_recall@1 0.1728 0.1731 0.1691 0.158 0.1369
cosine_recall@3 0.3954 0.3937 0.3853 0.3604 0.3186
cosine_recall@5 0.4741 0.4728 0.4654 0.4383 0.3895
cosine_recall@10 0.5349 0.5347 0.5289 0.5108 0.4661
cosine_ndcg@10 0.5188 0.5183 0.51 0.4848 0.4332
cosine_mrr@10 0.6564 0.6556 0.6436 0.606 0.5331
cosine_map@100 0.4055 0.4051 0.3979 0.3769 0.3365

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 41,342 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 23 tokens
    • mean: 43.27 tokens
    • max: 72 tokens
    • min: 279 tokens
    • mean: 960.03 tokens
    • max: 1076 tokens
  • Samples:
    anchor positive
    What reasons did the District provide for placing Mr. Kennedy on paid administrative leave after the October 26 game, and how did they justify their concerns regarding his postgame prayers? The letter also admitted that, during Mr. Kennedy’s recent October 16 postgame prayer, his students were otherwise engaged and not praying with him, and that his prayer was “fleeting.” Id., at 90, 93. Still, the District explained that a “reasonable observer” could think government endorsement of religion had occurred when a “District employee, on the field only by virtue of his employment with the District, still on duty” engaged in “overtly religious conduct.” Id., at 91, 93. The District thus made clear that the only option it would offer Mr. Kennedy was to allow him to pray after a game in a “private location” behind closed doors and “not observable to students or the public.” Id., at 93–94.

     

    After the October 23 game ended, Mr. Kennedy knelt at the 50-yard line, where “no one joined him,” and bowed his head for a “brief, quiet prayer.” 991 F.3d at 1019; App. 173, 236–239. The superintendent informed the District’s board that this prayer “moved closer to what we want,” but never...
    Why is it considered an abuse of discretion for a district court to require the S.E.C. to establish the "truth" of the allegations against a settling party as a condition for approving consent decrees? [ … ]

    We turn, then, to the far thornier question of what deference the district court owes an agency seeking a consent decree. Our Court recognizes a “strong federal policy favoring the approval and enforcement of consent decrees.” [ … ]“To be sure, when the district judge is presented with a proposed consent judgment, he is not merely a ‘rubber stamp.’ ” [ … ][ … ]

    [ … ]

    the proper standard for reviewing a proposed consent judgment involving an enforcement agency requires that the district court determine whether the proposed consent decree is fair and reasonable, with the additional requirement that the “public interest would not be disserved,” [ … ] in the event that the consent decree includes in-junctive relief. Absent a substantial basis in the record for concluding that the proposed consent decree does not meet these requirements, the district court is required to enter the order.

    We omit “adequacy” from the standard. Scrutinizing a proposed consent decree for “adequacy” ap...
    Describe the sequence of events that led to Officer McClendon asking Jamison for consent to search his vehicle. What were the key points of contention between Officer McClendon's and Jamison's accounts of this interaction? Officer McClendon pulled behind Jamison and flashed his blue lights. Jamison immediately pulled over to the right shoulder.[27]

    As Officer McClendon approached the passenger side of Jamison's car, Jamison rolled down the passenger side window. Officer McClendon began to speak with Jamison when he reached the window. According 393*393 to McClendon, he noticed that Jamison had recently purchased his car in Pennsylvania, and Jamison told him that he was traveling from "Vegas or Arizona."

    Officer McClendon asked Jamison for "his license, insurance, [and] the paperwork on the vehicle because it didn't have a tag." Jamison provided his bill of sale, insurance, and South Carolina driver's license. Officer McClendon returned to his car to conduct a background check using the El Paso Intelligence Center ("EPIC"). The EPIC check came back clear immediately. Officer McLendon then contacted the National Criminal Information Center ("NCIC") and asked the dispatcher to run a criminal history on Ja...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 32
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 32
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1238 10 59.6933 - - - - -
0.2477 20 20.2066 - - - - -
0.3715 30 10.2468 - - - - -
0.4954 40 7.7729 - - - - -
0.6192 50 6.5815 - - - - -
0.7430 60 5.8646 - - - - -
0.8669 70 5.0228 - - - - -
0.9907 80 4.8557 - - - - -
1.0 81 - 0.5013 0.4986 0.4888 0.4586 0.3932
1.1115 90 3.0385 - - - - -
1.2353 100 2.9601 - - - - -
1.3591 110 2.8391 - - - - -
1.4830 120 2.9631 - - - - -
1.6068 130 2.6344 - - - - -
1.7307 140 2.4715 - - - - -
1.8545 150 2.7462 - - - - -
1.9783 160 2.5805 - - - - -
2.0 162 - 0.5162 0.5142 0.5040 0.4778 0.4242
2.0991 170 2.0474 - - - - -
2.2229 180 1.9431 - - - - -
2.3467 190 2.0218 - - - - -
2.4706 200 1.8881 - - - - -
2.5944 210 1.6105 - - - - -
2.7183 220 1.9675 - - - - -
2.8421 230 1.6917 - - - - -
2.9659 240 1.8939 - - - - -
3.0 243 - 0.5188 0.5175 0.5097 0.4840 0.4303
3.0867 250 1.8625 - - - - -
3.2105 260 1.7864 - - - - -
3.3344 270 1.6404 - - - - -
3.4582 280 1.6378 - - - - -
3.5820 290 1.8484 - - - - -
3.7059 300 1.7864 - - - - -
3.8297 310 1.5436 - - - - -
3.9536 320 1.3438 0.5188 0.5183 0.51 0.4848 0.4332
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
12
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for legaltextai/modernbert-embed-base-legaltextai-matryoshka-legaldataset

Finetuned
(23)
this model

Evaluation results