Spaces:
Sleeping
Sleeping
Fix /web README source and speed up Docker rebuilds
Browse files- Dockerfile +6 -2
- WEB_README.md +48 -85
- requirements.txt +5 -0
- server/app.py +3 -3
Dockerfile
CHANGED
|
@@ -4,6 +4,10 @@ FROM python:3.10-slim
|
|
| 4 |
# Set working directory
|
| 5 |
WORKDIR /app
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
# Copy project files
|
| 8 |
COPY pyproject.toml .
|
| 9 |
COPY README.md .
|
|
@@ -15,8 +19,8 @@ COPY __init__.py .
|
|
| 15 |
COPY client.py .
|
| 16 |
COPY server ./server
|
| 17 |
|
| 18 |
-
# Install
|
| 19 |
-
RUN pip install --no-
|
| 20 |
|
| 21 |
# Expose the standard OpenEnv port
|
| 22 |
EXPOSE 8000
|
|
|
|
| 4 |
# Set working directory
|
| 5 |
WORKDIR /app
|
| 6 |
|
| 7 |
+
# Install dependencies first to maximize Docker cache reuse across code edits
|
| 8 |
+
COPY requirements.txt .
|
| 9 |
+
RUN pip install -r requirements.txt
|
| 10 |
+
|
| 11 |
# Copy project files
|
| 12 |
COPY pyproject.toml .
|
| 13 |
COPY README.md .
|
|
|
|
| 19 |
COPY client.py .
|
| 20 |
COPY server ./server
|
| 21 |
|
| 22 |
+
# Install local package metadata/entrypoints without reinstalling dependencies
|
| 23 |
+
RUN pip install --no-deps .
|
| 24 |
|
| 25 |
# Expose the standard OpenEnv port
|
| 26 |
EXPOSE 8000
|
WEB_README.md
CHANGED
|
@@ -1,107 +1,70 @@
|
|
| 1 |
# Cloud DevOps RLEnv
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
##
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
-
- Benchmark planning and tool-use behavior for cloud operations agents.
|
| 11 |
-
- Reward correct diagnosis over blind action execution.
|
| 12 |
-
- Provide repeatable task outcomes for fair grading and comparison.
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
-
|
|
|
|
| 22 |
|
| 23 |
-
|
| 24 |
-
- update_security_group expects parameters.port and usually parameters.action.
|
| 25 |
-
- restart_service targets a single instance by resource_id.
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
| 32 |
-
- output: command result payload.
|
| 33 |
-
- error: command error, when present.
|
| 34 |
-
- system_health_status: CRITICAL, DEGRADED, or HEALTHY.
|
| 35 |
-
- done: terminal flag.
|
| 36 |
-
- reward: scalar step reward.
|
| 37 |
-
- metadata: includes task name, resolution status, step count, and other diagnostics.
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
-
|
| 43 |
-
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
- easy:
|
| 48 |
-
Open port 80 on sg-web so web traffic can flow.
|
| 49 |
-
Expected difficulty: low.
|
| 50 |
-
- medium:
|
| 51 |
-
Inspect API logs to identify DB connectivity failure, then open port 5432 on sg-db.
|
| 52 |
-
Expected difficulty: medium (requires diagnosis before remediation).
|
| 53 |
-
- hard:
|
| 54 |
-
Trace load balancer timeout to i-web2, inspect the target, then restart the correct service.
|
| 55 |
-
Expected difficulty: high (multi-hop diagnosis and anti-shortcut checks).
|
| 56 |
-
|
| 57 |
-
## Setup And Usage
|
| 58 |
-
|
| 59 |
-
From repository root:
|
| 60 |
-
|
| 61 |
-
- Validate OpenEnv package structure and manifest:
|
| 62 |
-
..\\.venv\\Scripts\\openenv validate
|
| 63 |
-
- Run pre-submission validator (skip live inference):
|
| 64 |
-
bash scripts/pre_submit_validate.sh --skip-inference
|
| 65 |
-
- Build local submission image:
|
| 66 |
-
docker build -t cloud-devops-env:phase1 -f Dockerfile .
|
| 67 |
-
|
| 68 |
-
Optional local server run:
|
| 69 |
|
|
|
|
| 70 |
uvicorn server.app:app --host 0.0.0.0 --port 8000
|
|
|
|
| 71 |
|
| 72 |
-
## Inference
|
| 73 |
-
|
| 74 |
-
inference.py uses the OpenAI client and reads:
|
| 75 |
-
- API_BASE_URL
|
| 76 |
-
- MODEL_NAME
|
| 77 |
-
- HF_TOKEN
|
| 78 |
-
|
| 79 |
-
It emits strict structured logs:
|
| 80 |
-
- [START] { ... } per task
|
| 81 |
-
- [STEP] { ... } per environment action
|
| 82 |
-
- [END] { ... } per task summary
|
| 83 |
-
|
| 84 |
-
## Baseline Scores
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
| 89 |
-
-
|
| 90 |
-
-
|
|
|
|
| 91 |
|
| 92 |
-
|
| 93 |
-
- Aggregate scores are clamped to [0.0, 1.0].
|
| 94 |
-
- SUCCESS_SCORE_THRESHOLD for inference summaries is 0.8.
|
| 95 |
|
| 96 |
-
|
|
|
|
|
|
|
| 97 |
|
| 98 |
-
|
| 99 |
-
2. Keep README.md front matter for Space metadata.
|
| 100 |
-
3. Set Space secrets/variables:
|
| 101 |
-
- HF_TOKEN (secret)
|
| 102 |
-
- API_BASE_URL (for example https://router.huggingface.co/v1)
|
| 103 |
-
- MODEL_NAME (chosen model slug)
|
| 104 |
-
4. Wait for Space build to complete.
|
| 105 |
-
5. Verify endpoints:
|
| 106 |
-
- GET /health returns 200
|
| 107 |
-
- POST /reset returns 200
|
|
|
|
| 1 |
# Cloud DevOps RLEnv
|
| 2 |
|
| 3 |
+
This environment trains and tests agents on cloud incident response.
|
| 4 |
|
| 5 |
+
## What You Need To Do
|
| 6 |
|
| 7 |
+
Solve incidents by following the same workflow a real SRE would use:
|
| 8 |
+
1. Inspect resources.
|
| 9 |
+
2. Read logs.
|
| 10 |
+
3. Apply a safe fix.
|
| 11 |
+
4. Submit the solution.
|
| 12 |
|
| 13 |
+
## Available Actions
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
+
- `list_resources`: See all resources.
|
| 16 |
+
- `describe_resource`: View one resource.
|
| 17 |
+
- `view_logs`: Read logs for one resource.
|
| 18 |
+
- `update_security_group`: Add/modify security rules.
|
| 19 |
+
- `restart_service`: Restart an instance.
|
| 20 |
+
- `submit_solution`: Submit your final answer.
|
| 21 |
|
| 22 |
+
## What You Receive Each Step
|
| 23 |
|
| 24 |
+
- `output`: Main command result.
|
| 25 |
+
- `error`: Error text if a command fails.
|
| 26 |
+
- `system_health_status`: `CRITICAL`, `DEGRADED`, or `HEALTHY`.
|
| 27 |
+
- `reward`: Step reward.
|
| 28 |
+
- `done`: Whether the episode has ended.
|
| 29 |
|
| 30 |
+
## Difficulty Levels
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
- `easy`: Open port `80` on `sg-web`.
|
| 33 |
+
- `medium`: Find DB timeout in logs, then open port `5432` on `sg-db`.
|
| 34 |
+
- `hard`: Trace timeout through load balancer to `i-web2`, then restart the correct service.
|
| 35 |
|
| 36 |
+
## Quick Start
|
| 37 |
|
| 38 |
+
Run from repo root:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
```bash
|
| 41 |
+
..\\.venv\\Scripts\\openenv validate
|
| 42 |
+
bash scripts/pre_submit_validate.sh --skip-inference
|
| 43 |
+
docker build -t cloud-devops-env:phase1 -f Dockerfile .
|
| 44 |
+
```
|
| 45 |
|
| 46 |
+
Run server locally:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
```bash
|
| 49 |
uvicorn server.app:app --host 0.0.0.0 --port 8000
|
| 50 |
+
```
|
| 51 |
|
| 52 |
+
## Inference Requirements
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
`inference.py` reads:
|
| 55 |
+
- `API_BASE_URL`
|
| 56 |
+
- `MODEL_NAME`
|
| 57 |
+
- `HF_TOKEN`
|
| 58 |
|
| 59 |
+
It logs strict markers:
|
| 60 |
+
- `[START]`
|
| 61 |
+
- `[STEP]`
|
| 62 |
+
- `[END]`
|
| 63 |
|
| 64 |
+
## Baseline Score Targets
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
- easy: `1.0`
|
| 67 |
+
- medium: `0.8` to `1.0`
|
| 68 |
+
- hard: `1.0`
|
| 69 |
|
| 70 |
+
Scores are clamped to `[0.0, 1.0]`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openenv-core[core]>=0.2.2
|
| 2 |
+
pydantic>=2.0.0
|
| 3 |
+
openai>=1.0.0
|
| 4 |
+
fastapi>=0.115.0
|
| 5 |
+
uvicorn>=0.24.0
|
server/app.py
CHANGED
|
@@ -36,9 +36,9 @@ from pathlib import Path
|
|
| 36 |
os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
|
| 37 |
_BASE_DIR = Path(__file__).resolve().parent.parent
|
| 38 |
_WEB_README = _BASE_DIR / "WEB_README.md"
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
)
|
| 43 |
|
| 44 |
try:
|
|
|
|
| 36 |
os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
|
| 37 |
_BASE_DIR = Path(__file__).resolve().parent.parent
|
| 38 |
_WEB_README = _BASE_DIR / "WEB_README.md"
|
| 39 |
+
# Force a clean renderer source file for /web; fallback only if missing.
|
| 40 |
+
os.environ["ENV_README_PATH"] = str(
|
| 41 |
+
_WEB_README if _WEB_README.exists() else (_BASE_DIR / "README.md")
|
| 42 |
)
|
| 43 |
|
| 44 |
try:
|