RoBERTa Joint NER+RE Model for Legal Text Analysis

Model Description

This RoBERTa-based model performs joint Named Entity Recognition (NER) and Relation Extraction (RE) specifically fine-tuned for legal text analysis and human rights documentation. It's designed to identify legal entities and their relationships in multilingual legal documents.

Developed by: Lemkin AI
Model type: XLM-RoBERTa Large for Token Classification
Base model: Davlan/xlm-roberta-large-ner-hrl
Language(s): English, French, Spanish, Arabic
License: Apache 2.0

Model Details

Architecture

  • Base Model: XLM-RoBERTa Large (multilingual)
  • Parameters: 560M total parameters
  • Model Size: 2.1GB
  • Task Heads: Joint NER + RE classifier
  • Input Length: 512 tokens maximum
  • Layers: 24 transformer layers
  • Hidden Size: 1024
  • Attention Heads: 16

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("LemkinAI/roberta-joint-ner-re")
model = AutoModelForTokenClassification.from_pretrained("LemkinAI/roberta-joint-ner-re")

# Example text
text = "The International Criminal Court issued a warrant for the general's arrest."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Process results
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]

for token, label in zip(tokens, predicted_labels):
    if label != "O":
        print(f"{token}: {label}")

Model Performance

  • Named Entity Recognition F1: 0.92 (92% accuracy)
  • Relation Extraction F1: 0.87 (87% accuracy)
  • Supported Languages: English, French, Spanish, Arabic
  • Entity Types: 71 specialized legal entity types
  • Relation Types: 21 legal relation types

Training Data

Trained on 85,000 annotated legal documents including:

  • International court decisions (ICC, ICJ, ECHR)
  • Human rights reports and investigations
  • Legal case documents and treaties
  • Time period: 1990-2024

Use Cases

  • Legal document analysis and research
  • Human rights violation documentation
  • Evidence organization and structuring
  • Academic legal NLP research
  • Investigative journalism

Citation

@misc{lemkin-roberta-ner-re-2025,
  title={RoBERTa Joint NER+RE Model for Legal Text Analysis},
  author={Lemkin AI Team},
  year={2025},
  url={https://huggingface.co/LemkinAI/roberta-joint-ner-re}
}
Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support