Model Card for Qwen2.5-3B-Instruct-SLDS
Model Summary
This model is a Qwen2.5-3B-Instruct fine-tuned on the Swiss Landmark Decisions Summarization (SLDS) dataset.
SLDS is a multilingual dataset of 20,000 Swiss Federal Supreme Court decisions (1954–2024), each paired with headnotes in German, French, and Italian, resulting in ~60,000 decision–headnote pairs.
The model is optimized for legal abstractive summarization and is capable of producing concise, legally structured headnotes.
It can be used for both monolingual and cross-lingual summarization tasks.
This model was trained 2x faster with Unsloth and Huggingface's TRL library.
Intended Use
- Primary Task: Judicial summarization (decision → headnote generation).
- Languages: German (
de
), French (fr
), Italian (it
). - Scenarios:
- Monolingual summarization: e.g., German decision → German headnote.
- Cross-lingual summarization: e.g., German decision → French headnote.
- Legal research support: assisting in retrieval and navigation of court decisions.
Not intended for:
- Replacing human legal expertise.
- Serving as an authoritative legal source.
- Automated legal advice or decision-making.
Training Data
- Dataset: Swiss Landmark Decisions Summarization (SLDS).
- Size: ~20K decisions, ~60K decision–headnote pairs.
- Splits: Train (1954–2021), Validation (2022), Test (2023–2024).
- Source: Swiss Federal Supreme Court.
Training Procedure
Base Models:
- Qwen2.5 family (0.5B–14B)
- Llama 3.2 (3B)
- Phi-3.5-mini
Fine-tuning Objective: Conditional generation (decision → headnote).
Evaluation Metrics:
- Lexical: ROUGE-1/2/L, BLEU, BERTScore.
- Domain-specific: LLM-as-a-Judge framework (DeepSeek V3) assessing five rubrics: accuracy, completeness, clarity, legal citations, and considerations.
Model Performance
On the SLDS test set (2023–2024):
Model | Setting | BERTScore ↑ | BLEU ↑ | ROUGE-1 ↑ | ROUGE-2 ↑ | ROUGE-L ↑ | JUDGE ↑ |
---|---|---|---|---|---|---|---|
Phi-3.5-mini | fine-tuned | 11.24 ± 3.82 | 34.84 ± 0.41 | 31.20 ± 2.08 | 14.11 ± 1.27 | 20.96 ± 1.35 | 15.25 ± 2.32 |
Llama 3.2B | fine-tuned | 15.20 ± 4.40 | 21.89 ± 0.42 | 31.89 ± 2.34 | 14.87 ± 1.61 | 22.49 ± 1.60 | 18.47 ± 2.99 |
Qwen2.5 0.5B | fine-tuned | -1.37 ± 3.85 | 32.20 ± 0.35 | 23.87 ± 1.68 | 9.46 ± 0.94 | 17.37 ± 1.09 | 5.80 ± 1.26 |
Qwen2.5 1.5B | fine-tuned | 19.81 ± 2.72 | 36.79 ± 0.34 | 33.03 ± 1.73 | 14.14 ± 1.08 | 22.67 ± 1.13 | 15.92 ± 2.27 |
Qwen2.5 3B | fine-tuned | 23.23 ± 2.80 | 38.42 ± 0.34 | 35.18 ± 1.79 | 15.66 ± 1.23 | 24.10 ± 1.17 | 20.31 ± 2.66 |
Qwen2.5 7B | fine-tuned | 29.59 ± 1.97 | 41.40 ± 0.34 | 39.24 ± 1.59 | 18.26 ± 1.25 | 26.44 ± 1.15 | 28.37 ± 3.07 |
Qwen2.5 14B | fine-tuned | 32.48 ± 1.98 | 41.80 ± 0.37 | 40.04 ± 1.74 | 19.99 ± 1.41 | 28.00 ± 1.28 | 31.38 ± 3.19 |
GPT-4o | one-shot | 30.44 ± 1.74 | 31.89 ± 0.25 | 42.12 ± 1.79 | 18.92 ± 1.22 | 25.92 ± 1.05 | 39.70 ± 2.66 |
Claude 3.5 Sonnet | one-shot | 5.53 ± 2.00 | 21.88 ± 0.25 | 41.86 ± 1.64 | 19.23 ± 1.19 | 27.67 ± 1.20 | 41.25 ± 2.90 |
DeepSeek-R1 | one-shot | 20.28 ± 1.45 | 22.37 ± 0.18 | 38.30 ± 1.82 | 15.97 ± 0.85 | 21.03 ± 0.84 | 42.28 ± 2.21 |
o3-mini | one-shot | 14.18 ± 1.31 | 20.55 ± 0.17 | 34.77 ± 1.43 | 11.92 ± 0.69 | 18.21 ± 0.67 | 34.82 ± 2.41 |
- Lexical metrics: Fine-tuned models outperform in overlap-based scores.
- LLM-judge scores: Larger proprietary and reasoning models outperform in legal precision.
Limitations
- Language imbalance: German decisions dominate, while Italian remains underrepresented.
- Biases: Headnotes reflect judicial style and conventions, not neutral summaries.
- Evaluation mismatch: ROUGE and BLEU may not fully capture legal accuracy.
- Overfitting risk: Models may overfit to formulaic headnote structures.
- Cross-lingual difficulty: Some models struggle with non-monolingual headnote generation.
Ethical Considerations
- Sensitive information: All data is anonymized by the Swiss Federal Supreme Court before publication.
- Legal risk: Generated headnotes must not be used as official legal advice.
- Fair use: Ensure attribution when reusing outputs.
How to Cite
If you use this model, please cite the dataset paper:
@article{rolshoven2025slds,
title={Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland},
author={Luca Rolshoven and Vishvaksenan Rasiah and Srinanda Brügger Bose and Sarah Hostettler and Lara Burkhalter and Matthias Stürmer and Joel Niklaus},
year={2025},
eprint={2410.13456},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.13456},
}
- Downloads last month
- 7