The code for creating the datasets is available at https://github.com/JoelNiklaus/SwissLegalTranslations.
Joel Niklaus
joelniklaus
AI & ML interests
Pretraining, Instruction Tuning, Domain Adaptation, Benchmarks, Legal Datasets
Recent Activity
updated
a collection
9 days ago
SwiLTra-Bench
updated
a collection
9 days ago
SwiLTra-Bench
updated
a collection
9 days ago
SwiLTra-Bench
Organizations
SCALE Models
Scaling up the Complexity for Advanced Language Model Evaluation
MultiLegalPile Models
A 689GB Multilingual Legal Corpus
MultiLegalSBD Datasets
A Multilingual Legal Sentence Boundary Detection Dataset
ClassActionPrediction Datasets
A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the US
Anonymization
Automatic Anonymization of Swiss Federal Supreme Court Rulings
LegalLens Datasets
Datasets for the paper https://arxiv.org/abs/2402.04335
LegalLMs
XLM-RoBERTa models with continued pretraining on the MultiLegalPile
SCALE Datasets
Scaling up the Complexity for Advanced Language Model Evaluation
MultiLegalPile Datasets
A 689GB Multilingual Legal Corpus
MultiLegalSBD Models
A Multilingual Legal Sentence Boundary Detection Dataset
-
rcds/distilbert-SBD-de-judgements-laws
Token Classification • Updated • 37 -
rcds/distilbert-SBD-en-judgements-laws
Token Classification • 0.1B • Updated • 45 -
rcds/distilbert-SBD-es-judgements-laws
Token Classification • Updated • 30 -
rcds/distilbert-SBD-it-judgements-laws
Token Classification • Updated • 30
Anonymity at Risk? Datasets
Assessing Re-Identification Capabilities of Large Language Models
Explainability Datasets
Datasets for the paper https://arxiv.org/abs/2402.17013
SwiLTra-Bench
The code for creating the datasets is available at https://github.com/JoelNiklaus/SwissLegalTranslations.
LegalLMs
XLM-RoBERTa models with continued pretraining on the MultiLegalPile
SCALE Models
Scaling up the Complexity for Advanced Language Model Evaluation
SCALE Datasets
Scaling up the Complexity for Advanced Language Model Evaluation
MultiLegalPile Models
A 689GB Multilingual Legal Corpus
MultiLegalPile Datasets
A 689GB Multilingual Legal Corpus
MultiLegalSBD Datasets
A Multilingual Legal Sentence Boundary Detection Dataset
MultiLegalSBD Models
A Multilingual Legal Sentence Boundary Detection Dataset
-
rcds/distilbert-SBD-de-judgements-laws
Token Classification • Updated • 37 -
rcds/distilbert-SBD-en-judgements-laws
Token Classification • 0.1B • Updated • 45 -
rcds/distilbert-SBD-es-judgements-laws
Token Classification • Updated • 30 -
rcds/distilbert-SBD-it-judgements-laws
Token Classification • Updated • 30
ClassActionPrediction Datasets
A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the US
Anonymity at Risk? Datasets
Assessing Re-Identification Capabilities of Large Language Models
Anonymization
Automatic Anonymization of Swiss Federal Supreme Court Rulings
Explainability Datasets
Datasets for the paper https://arxiv.org/abs/2402.17013
LegalLens Datasets
Datasets for the paper https://arxiv.org/abs/2402.04335