RoBERTa Joint NER+RE Model for Legal Text Analysis
Model Description
This RoBERTa-based model performs joint Named Entity Recognition (NER) and Relation Extraction (RE) specifically fine-tuned for legal text analysis and human rights documentation. It's designed to identify legal entities and their relationships in multilingual legal documents.
Developed by: Lemkin AI
Model type: XLM-RoBERTa Large for Token Classification
Base model: Davlan/xlm-roberta-large-ner-hrl
Language(s): English, French, Spanish, Arabic
License: Apache 2.0
Model Details
Architecture
- Base Model: XLM-RoBERTa Large (multilingual)
- Parameters: 560M total parameters
- Model Size: 2.1GB
- Task Heads: Joint NER + RE classifier
- Input Length: 512 tokens maximum
- Layers: 24 transformer layers
- Hidden Size: 1024
- Attention Heads: 16
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("LemkinAI/roberta-joint-ner-re")
model = AutoModelForTokenClassification.from_pretrained("LemkinAI/roberta-joint-ner-re")
# Example text
text = "The International Criminal Court issued a warrant for the general's arrest."
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# Process results
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
for token, label in zip(tokens, predicted_labels):
if label != "O":
print(f"{token}: {label}")
Model Performance
- Named Entity Recognition F1: 0.92 (92% accuracy)
- Relation Extraction F1: 0.87 (87% accuracy)
- Supported Languages: English, French, Spanish, Arabic
- Entity Types: 71 specialized legal entity types
- Relation Types: 21 legal relation types
Training Data
Trained on 85,000 annotated legal documents including:
- International court decisions (ICC, ICJ, ECHR)
- Human rights reports and investigations
- Legal case documents and treaties
- Time period: 1990-2024
Use Cases
- Legal document analysis and research
- Human rights violation documentation
- Evidence organization and structuring
- Academic legal NLP research
- Investigative journalism
Citation
@misc{lemkin-roberta-ner-re-2025,
title={RoBERTa Joint NER+RE Model for Legal Text Analysis},
author={Lemkin AI Team},
year={2025},
url={https://huggingface.co/LemkinAI/roberta-joint-ner-re}
}
- Downloads last month
- 45
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support