|
--- |
|
title: Les Audits d'Affaires - Leaderboard |
|
emoji: ⚖️ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: 5.34.2 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
short_description: Leaderboard français pour LLMs sur droit des affaires |
|
--- |
|
|
|
# Les Audits d'Affaires - Leaderboard |
|
|
|
Performance dashboard for LLMs on French business law benchmark with HuggingFace Datasets integration. |
|
|
|
## 🚀 Setup Complete! |
|
|
|
### HuggingFace Datasets |
|
- **Requests**: `legmlai/laal-requests` - tracks evaluation requests |
|
- **Results**: `legmlai/laal-results` - stores evaluation results |
|
|
|
### Current Models (with 0 scores) |
|
1. `Qwen/Qwen3-14B` (Alibaba) |
|
2. `jpacifico/Chocolatine-2-14B-Instruct-v2.0.3` (jpacifico) |
|
3. `meta-llama/Llama-3.1-8B-Instruct` (Meta) |
|
|
|
## 🏃♂️ Quick Start |
|
|
|
### Prerequisites |
|
```bash |
|
export HF_TOKEN=your_huggingface_token |
|
``` |
|
|
|
### Run Leaderboard |
|
```bash |
|
cd les-audites-affaires-leadboard |
|
source venv/bin/activate |
|
python app.py |
|
``` |
|
|
|
The leaderboard will be available at: http://127.0.0.1:7860 |
|
|
|
## 📁 Project Structure |
|
|
|
### Core Files |
|
- `app.py` - Main leaderboard application with HuggingFace integration |
|
- `dataset_manager.py` - HuggingFace datasets management |
|
- `requirements.txt` - Python dependencies |
|
|
|
### Setup Scripts |
|
- `create_datasets.py` - Initialize HuggingFace datasets |
|
- `setup_initial_models.py` - Add initial models with 0 scores |
|
|
|
## ✨ Features |
|
|
|
### Live Leaderboard |
|
- Real-time data from HuggingFace datasets |
|
- Automatic ranking and scoring |
|
- Category-wise performance breakdown |
|
- Interactive comparison charts |
|
|
|
### Model Submissions |
|
- Submit models directly through the UI |
|
- Automatic request tracking |
|
- Email notifications for updates |
|
|
|
### Pipeline Status Tracking |
|
- 📊 Real-time status: Pending, Processing, Completed, Failed |
|
- 📋 Recent evaluation requests table |
|
- 🔄 Pipeline progress monitoring |
|
- ⏳ Request status updates with emojis |
|
|
|
### Data Management |
|
- All data stored in HuggingFace datasets |
|
- Refresh button for live updates |
|
- Persistent across sessions |
|
- Status update automation ready |
|
|
|
### Token Management |
|
- Graceful handling when HF_TOKEN not configured |
|
- Clear instructions for Hugging Face Spaces deployment |
|
- Read-only mode when token unavailable |
|
|
|
## 🔄 Next Steps |
|
|
|
1. **Deploy to Hugging Face Spaces**: Upload to HF Spaces with HF_TOKEN secret |
|
2. **Connect Evaluation Harness**: Use `automation_example.py` as integration guide |
|
3. **Real Evaluations**: Replace 0 scores with actual evaluation results |
|
4. **Webhooks**: Set up automatic updates from evaluation pipeline |
|
5. **Enhanced Analytics**: Add more detailed performance breakdowns |
|
|
|
## 📊 Dataset Schema |
|
|
|
### Requests Dataset |
|
- `request_id`, `model_name`, `model_provider`, `request_type` |
|
- `request_status`, `contact_email`, `request_timestamp` |
|
|
|
### Results Dataset |
|
- `result_id`, `request_id`, `model_name`, `model_provider` |
|
- `overall_score`, `score_action_requise`, `score_delai_legal` |
|
- `score_documents_obligatoires`, `score_impact_financier` |
|
- `score_consequences_non_conformite`, `evaluation_timestamp`, `is_published` |
|
|
|
--- |
|
|
|
**Created by**: Mohamad Alhajar (legml.ai) |
|
**License**: MIT |