Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
louisbrulenaudet 
posted an update Jun 27
Post
3190
I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training 🤗

This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.

The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.

Dataset: louisbrulenaudet/legalkit

Stay tuned for further updates and release information 🔥

@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!

Note : My special thanks to @alvdansen for their illustration models ❤️

Congrats :D

very cool! feel free to create the HF for Legal org and share about it and we can amplify!