RoBERTa Joint NER+RE Model for Legal Text Analysis

Model Description

This RoBERTa-based model performs joint Named Entity Recognition (NER) and Relation Extraction (RE) specifically fine-tuned for legal text analysis and human rights documentation. It's designed to identify legal entities and their relationships in multilingual legal documents.

Developed by: Lemkin AI
Model type: XLM-RoBERTa Large for Token Classification
Base model: Davlan/xlm-roberta-large-ner-hrl
Language(s): English, French, Spanish, Arabic
License: Apache 2.0

Model Details

Architecture

Base Model: XLM-RoBERTa Large (multilingual)
Parameters: 560M total parameters
Model Size: 2.1GB
Task Heads: Joint NER + RE classifier
Input Length: 512 tokens maximum
Layers: 24 transformer layers
Hidden Size: 1024
Attention Heads: 16

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("LemkinAI/roberta-joint-ner-re")
model = AutoModelForTokenClassification.from_pretrained("LemkinAI/roberta-joint-ner-re")

# Example text
text = "The International Criminal Court issued a warrant for the general's arrest."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Process results
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]

for token, label in zip(tokens, predicted_labels):
    if label != "O":
        print(f"{token}: {label}")

Model Performance

Named Entity Recognition F1: 0.92 (92% accuracy)
Relation Extraction F1: 0.87 (87% accuracy)
Supported Languages: English, French, Spanish, Arabic
Entity Types: 71 specialized legal entity types
Relation Types: 21 legal relation types

Training Data

Trained on 85,000 annotated legal documents including:

International court decisions (ICC, ICJ, ECHR)
Human rights reports and investigations
Legal case documents and treaties
Time period: 1990-2024

Use Cases

Legal document analysis and research
Human rights violation documentation
Evidence organization and structuring
Academic legal NLP research
Investigative journalism

Citation

@misc{lemkin-roberta-ner-re-2025,
  title={RoBERTa Joint NER+RE Model for Legal Text Analysis},
  author={Lemkin AI Team},
  year={2025},
  url={https://huggingface.co/LemkinAI/roberta-joint-ner-re}
}