les-audites-affaires-leadboard's picture
added integeration with hf datasets
eec2367
---
title: Les Audits d'Affaires - Leaderboard
emoji: ⚖️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
license: mit
short_description: Leaderboard français pour LLMs sur droit des affaires
---
# Les Audits d'Affaires - Leaderboard
Performance dashboard for LLMs on French business law benchmark with HuggingFace Datasets integration.
## 🚀 Setup Complete!
### HuggingFace Datasets
- **Requests**: `legmlai/laal-requests` - tracks evaluation requests
- **Results**: `legmlai/laal-results` - stores evaluation results
### Current Models (with 0 scores)
1. `Qwen/Qwen3-14B` (Alibaba)
2. `jpacifico/Chocolatine-2-14B-Instruct-v2.0.3` (jpacifico)
3. `meta-llama/Llama-3.1-8B-Instruct` (Meta)
## 🏃‍♂️ Quick Start
### Prerequisites
```bash
export HF_TOKEN=your_huggingface_token
```
### Run Leaderboard
```bash
cd les-audites-affaires-leadboard
source venv/bin/activate
python app.py
```
The leaderboard will be available at: http://127.0.0.1:7860
## 📁 Project Structure
### Core Files
- `app.py` - Main leaderboard application with HuggingFace integration
- `dataset_manager.py` - HuggingFace datasets management
- `requirements.txt` - Python dependencies
### Setup Scripts
- `create_datasets.py` - Initialize HuggingFace datasets
- `setup_initial_models.py` - Add initial models with 0 scores
## ✨ Features
### Live Leaderboard
- Real-time data from HuggingFace datasets
- Automatic ranking and scoring
- Category-wise performance breakdown
- Interactive comparison charts
### Model Submissions
- Submit models directly through the UI
- Automatic request tracking
- Email notifications for updates
### Pipeline Status Tracking
- 📊 Real-time status: Pending, Processing, Completed, Failed
- 📋 Recent evaluation requests table
- 🔄 Pipeline progress monitoring
- ⏳ Request status updates with emojis
### Data Management
- All data stored in HuggingFace datasets
- Refresh button for live updates
- Persistent across sessions
- Status update automation ready
### Token Management
- Graceful handling when HF_TOKEN not configured
- Clear instructions for Hugging Face Spaces deployment
- Read-only mode when token unavailable
## 🔄 Next Steps
1. **Deploy to Hugging Face Spaces**: Upload to HF Spaces with HF_TOKEN secret
2. **Connect Evaluation Harness**: Use `automation_example.py` as integration guide
3. **Real Evaluations**: Replace 0 scores with actual evaluation results
4. **Webhooks**: Set up automatic updates from evaluation pipeline
5. **Enhanced Analytics**: Add more detailed performance breakdowns
## 📊 Dataset Schema
### Requests Dataset
- `request_id`, `model_name`, `model_provider`, `request_type`
- `request_status`, `contact_email`, `request_timestamp`
### Results Dataset
- `result_id`, `request_id`, `model_name`, `model_provider`
- `overall_score`, `score_action_requise`, `score_delai_legal`
- `score_documents_obligatoires`, `score_impact_financier`
- `score_consequences_non_conformite`, `evaluation_timestamp`, `is_published`
---
**Created by**: Mohamad Alhajar (legml.ai)
**License**: MIT