Spaces:
Sleeping
Sleeping
Quick Start Guide
Installation
- Install dependencies with uv
uv sync
- Install Playwright browsers
uv run playwright install chromium
- Set up environment variables
cp .env.example .env
Edit .env and configure at minimum:
STUDENT_SECRET- Your secret keySTUDENT_EMAIL- Your emailGITHUB_TOKEN- GitHub personal access tokenGITHUB_USERNAME- Your GitHub usernameANTHROPIC_API_KEYorOPENAI_API_KEY- LLM API key
Running the System
Option 1: Using main.py CLI
Start Student API:
uv run python main.py student-api
Start Instructor API:
uv run python main.py instructor-api
Initialize Database:
uv run python main.py init-db
Run Round 1:
uv run python main.py round1
Run Evaluation:
uv run python main.py evaluate
Run Round 2:
uv run python main.py round2
Option 2: Direct module execution
Start Student API:
uv run python -m student.api
Start Instructor API:
uv run python -m instructor.api
Testing the System
1. Test Student API
Start the student API:
uv run python main.py student-api
In another terminal, send a test request:
curl -X POST http://localhost:8000/api/build \
-H "Content-Type: application/json" \
-d '{
"email": "[email protected]",
"secret": "your-secret",
"task": "test-task-abc",
"round": 1,
"nonce": "unique-nonce-123",
"brief": "Create a simple Hello World page with Bootstrap",
"checks": ["Page displays Hello World", "Bootstrap is loaded"],
"evaluation_url": "http://localhost:8001/api/evaluate",
"attachments": []
}'
Check status:
curl http://localhost:8000/api/status/test-task-abc
2. Test Instructor Workflow
Start Instructor API:
uv run python main.py instructor-api
Initialize Database:
uv run python main.py init-db
Create submissions.csv:
timestamp,email,endpoint,secret
2025-01-15T10:00:00,[email protected],http://localhost:8000/api/build,your-secret
Run Round 1:
uv run python main.py round1
Run Evaluation:
uv run python main.py evaluate
Check Results:
curl http://localhost:8001/api/results/[email protected]
Project Structure Overview
student/ # Student side (receives tasks, generates code)
βββ api.py # API endpoint
βββ code_generator.py # LLM code generation
βββ github_manager.py # GitHub operations
βββ notification_client.py # Notify evaluation
instructor/ # Instructor side (generates tasks, evaluates)
βββ api.py # Evaluation endpoint
βββ database.py # Database operations
βββ task_templates.py # Template management
βββ round1.py # Generate round 1 tasks
βββ round2.py # Generate round 2 tasks
βββ evaluate.py # Run evaluations
βββ checks/ # Evaluation checks
βββ static_checks.py
βββ dynamic_checks.py
βββ llm_checks.py
shared/ # Shared utilities
βββ config.py # Configuration
βββ models.py # Data models
βββ logger.py # Logging
βββ utils.py # Utilities
templates/ # Task templates (YAML)
Next Steps
- Configure your
.envfile with actual credentials - Set up a PostgreSQL database (if using instructor features)
- Review the task templates in
templates/ - Test the student API with a simple request
- Set up the instructor system for evaluation
Common Issues
Import errors: Make sure you run commands with uv run prefix
GitHub auth errors: Verify GITHUB_TOKEN in .env has proper permissions
Database errors: Make sure PostgreSQL is running and DATABASE_URL is correct
LLM errors: Check your API key and quota
For more details, see README.md