les-audites-affaires-leadboard's picture
added integeration with hf datasets
eec2367

A newer version of the Gradio SDK is available: 5.47.2

Upgrade
metadata
title: Les Audits d'Affaires - Leaderboard
emoji: ⚖️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
license: mit
short_description: Leaderboard français pour LLMs sur droit des affaires

Les Audits d'Affaires - Leaderboard

Performance dashboard for LLMs on French business law benchmark with HuggingFace Datasets integration.

🚀 Setup Complete!

HuggingFace Datasets

  • Requests: legmlai/laal-requests - tracks evaluation requests
  • Results: legmlai/laal-results - stores evaluation results

Current Models (with 0 scores)

  1. Qwen/Qwen3-14B (Alibaba)
  2. jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 (jpacifico)
  3. meta-llama/Llama-3.1-8B-Instruct (Meta)

🏃‍♂️ Quick Start

Prerequisites

export HF_TOKEN=your_huggingface_token

Run Leaderboard

cd les-audites-affaires-leadboard
source venv/bin/activate
python app.py

The leaderboard will be available at: http://127.0.0.1:7860

📁 Project Structure

Core Files

  • app.py - Main leaderboard application with HuggingFace integration
  • dataset_manager.py - HuggingFace datasets management
  • requirements.txt - Python dependencies

Setup Scripts

  • create_datasets.py - Initialize HuggingFace datasets
  • setup_initial_models.py - Add initial models with 0 scores

✨ Features

Live Leaderboard

  • Real-time data from HuggingFace datasets
  • Automatic ranking and scoring
  • Category-wise performance breakdown
  • Interactive comparison charts

Model Submissions

  • Submit models directly through the UI
  • Automatic request tracking
  • Email notifications for updates

Pipeline Status Tracking

  • 📊 Real-time status: Pending, Processing, Completed, Failed
  • 📋 Recent evaluation requests table
  • 🔄 Pipeline progress monitoring
  • ⏳ Request status updates with emojis

Data Management

  • All data stored in HuggingFace datasets
  • Refresh button for live updates
  • Persistent across sessions
  • Status update automation ready

Token Management

  • Graceful handling when HF_TOKEN not configured
  • Clear instructions for Hugging Face Spaces deployment
  • Read-only mode when token unavailable

🔄 Next Steps

  1. Deploy to Hugging Face Spaces: Upload to HF Spaces with HF_TOKEN secret
  2. Connect Evaluation Harness: Use automation_example.py as integration guide
  3. Real Evaluations: Replace 0 scores with actual evaluation results
  4. Webhooks: Set up automatic updates from evaluation pipeline
  5. Enhanced Analytics: Add more detailed performance breakdowns

📊 Dataset Schema

Requests Dataset

  • request_id, model_name, model_provider, request_type
  • request_status, contact_email, request_timestamp

Results Dataset

  • result_id, request_id, model_name, model_provider
  • overall_score, score_action_requise, score_delai_legal
  • score_documents_obligatoires, score_impact_financier
  • score_consequences_non_conformite, evaluation_timestamp, is_published

Created by: Mohamad Alhajar (legml.ai)
License: MIT