LexiBrief: Legal Document Summarizer
Model Description
This model is a fine-tuned version of google/flan-t5-base specifically optimized for legal document summarization. It has been trained on the BillSum dataset, making it particularly effective at summarizing various types of legal documents including:
- Legislative bills
- Legal contracts
- Court documents
- Legal agreements
- Regulatory documents
The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's strong language understanding capabilities. This approach allows the model to:
- Maintain the general language understanding from FLAN-T5
- Develop specialized legal domain expertise
- Achieve high-quality summarization with minimal training resources
Key Features and Benefits
Legal Domain Specialization:
- Trained specifically on legal documents
- Understands legal terminology and context
- Maintains formal language appropriate for legal documents
Performance Advantages:
- Generates concise yet comprehensive summaries
- Preserves critical legal details
- Handles complex legal terminology effectively
- Maintains document structure awareness
Technical Improvements:
- Optimized sequence length for legal documents
- Enhanced attention to legal terms and clauses
- Efficient processing of long documents
- Memory-efficient thanks to LoRA adaptation
Intended Uses & Limitations
Intended Uses
- Summarizing legislative bills and legal documents
- Creating executive summaries of legal agreements
- Quick document review and analysis
- Legal research assistance
- Contract analysis and summary generation
Limitations
- The model is primarily trained on US legislative bills and legal documents
- Input documents should be in English
- Maximum input length is 384 tokens
- Generated summaries are limited to 128 tokens
- May not capture extremely technical legal nuances
- Should not be used as a replacement for legal professionals
- Not suitable for non-English legal documents
Training and Evaluation Data
Training Data
The model was trained on:
BillSum Dataset:
- Contains US Congressional bills
- Provides high-quality summaries
- Focuses on legislative language
LexGlue Components:
- Legal document corpus
- Various legal document types
- Professional-grade annotations
Training Configuration
LoRA Parameters:
- Rank (r): 32
- Alpha: 32
- Target Modules: q, k, v, o attention layers
- Task Type: SEQ_2_SEQ_LM
Training Hyperparameters:
- Batch Size: 12 (train), 24 (eval)
- Learning Rate: 8e-4
- Epochs: 2
- Max Input Length: 384 tokens
- Max Output Length: 128 tokens
- Mixed Precision: bfloat16
Performance and Evaluation
The model demonstrates strong performance in legal document summarization:
- Maintains high factual accuracy
- Preserves critical legal details
- Generates coherent and structured summaries
- Handles complex legal terminology effectively
Metrics:
- Training Loss: 1.5808
- ROUGE Scores:
- ROUGE-1: ~0.45
- ROUGE-2: ~0.28
- ROUGE-L: ~0.42
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
model_name = "AryanT11/lexibrief-legal-summarizer" # Replace with actual model ID
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Prepare input
text = "Your legal document here..."
inputs = tokenizer(f"summarize legal document: {text}",
return_tensors="pt",
max_length=384,
truncation=True)
# Generate summary
outputs = model.generate(**inputs,
max_length=128,
temperature=0.7,
do_sample=True)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)
Example Output
Input:
SECTION 1. SHORT TITLE.
This Act may be cited as the "Sample Legal Document Act of 2024".
SECTION 2. PURPOSE.
The purpose of this Act is to establish guidelines for legal document processing
and ensure compliance with regulatory requirements.
Output:
This Act, known as the Sample Legal Document Act of 2024, establishes guidelines
for legal document processing and ensures regulatory compliance.
Citation
If you use this model, please cite:
@misc{lexibrief2025,
title={LexiBrief: Legal Document Summarizer},
author={Aryan Tapkire},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/AryanT11/lexibrief-legal-summarizer}
}
Contact
For questions, issues, or feedback about this model, please:
- Contact me on [email protected]
- Open an issue on the model repository
- Downloads last month
- 1
Dataset used to train AryanT11/lexibrief-legal-summarizer
Evaluation results
- Training Loss on billsumtest set self-reported1.581
- ROUGE-1 on billsumtest set self-reported0.450
- ROUGE-2 on billsumtest set self-reported0.280
- ROUGE-L on billsumtest set self-reported0.420