Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published 12 days ago • 72
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 By tomaarsen and 1 other • 12 days ago • 88
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 12 items • Updated Jan 6 • 139
view article Article Multi-Label Classification Model From Scratch: Step-by-Step Tutorial By Valerii-Knowledgator • Jan 8, 2024 • 45
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28, 2024 • 66
Tajik Datasets Collection Datasets that have tajik subset or entirely tajik • 13 items • Updated Feb 20 • 4
Open Australian Legal Models Collection A collection of open source Australian legal language models • 6 items • Updated Jun 15, 2024 • 1
Open Australian Legal Data Collection A collection of open source Australian legal datasets • 3 items • Updated Jun 15, 2024 • 5