--- title: Les Audits d'Affaires - Leaderboard emoji: βš–οΈ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.34.2 app_file: app.py pinned: false license: mit short_description: Leaderboard franΓ§ais pour LLMs sur droit des affaires --- # Les Audits d'Affaires - Leaderboard Performance dashboard for LLMs on French business law benchmark with HuggingFace Datasets integration. ## πŸš€ Setup Complete! ### HuggingFace Datasets - **Requests**: `legmlai/laal-requests` - tracks evaluation requests - **Results**: `legmlai/laal-results` - stores evaluation results ### Current Models (with 0 scores) 1. `Qwen/Qwen3-14B` (Alibaba) 2. `jpacifico/Chocolatine-2-14B-Instruct-v2.0.3` (jpacifico) 3. `meta-llama/Llama-3.1-8B-Instruct` (Meta) ## πŸƒβ€β™‚οΈ Quick Start ### Prerequisites ```bash export HF_TOKEN=your_huggingface_token ``` ### Run Leaderboard ```bash cd les-audites-affaires-leadboard source venv/bin/activate python app.py ``` The leaderboard will be available at: http://127.0.0.1:7860 ## πŸ“ Project Structure ### Core Files - `app.py` - Main leaderboard application with HuggingFace integration - `dataset_manager.py` - HuggingFace datasets management - `requirements.txt` - Python dependencies ### Setup Scripts - `create_datasets.py` - Initialize HuggingFace datasets - `setup_initial_models.py` - Add initial models with 0 scores ## ✨ Features ### Live Leaderboard - Real-time data from HuggingFace datasets - Automatic ranking and scoring - Category-wise performance breakdown - Interactive comparison charts ### Model Submissions - Submit models directly through the UI - Automatic request tracking - Email notifications for updates ### Pipeline Status Tracking - πŸ“Š Real-time status: Pending, Processing, Completed, Failed - πŸ“‹ Recent evaluation requests table - πŸ”„ Pipeline progress monitoring - ⏳ Request status updates with emojis ### Data Management - All data stored in HuggingFace datasets - Refresh button for live updates - Persistent across sessions - Status update automation ready ### Token Management - Graceful handling when HF_TOKEN not configured - Clear instructions for Hugging Face Spaces deployment - Read-only mode when token unavailable ## πŸ”„ Next Steps 1. **Deploy to Hugging Face Spaces**: Upload to HF Spaces with HF_TOKEN secret 2. **Connect Evaluation Harness**: Use `automation_example.py` as integration guide 3. **Real Evaluations**: Replace 0 scores with actual evaluation results 4. **Webhooks**: Set up automatic updates from evaluation pipeline 5. **Enhanced Analytics**: Add more detailed performance breakdowns ## πŸ“Š Dataset Schema ### Requests Dataset - `request_id`, `model_name`, `model_provider`, `request_type` - `request_status`, `contact_email`, `request_timestamp` ### Results Dataset - `result_id`, `request_id`, `model_name`, `model_provider` - `overall_score`, `score_action_requise`, `score_delai_legal` - `score_documents_obligatoires`, `score_impact_financier` - `score_consequences_non_conformite`, `evaluation_timestamp`, `is_published` --- **Created by**: Mohamad Alhajar (legml.ai) **License**: MIT