SidhaGarg commited on
Commit
222ebaa
·
1 Parent(s): 9075e6a

Fix /web README source and speed up Docker rebuilds

Browse files
Files changed (4) hide show
  1. Dockerfile +6 -2
  2. WEB_README.md +48 -85
  3. requirements.txt +5 -0
  4. server/app.py +3 -3
Dockerfile CHANGED
@@ -4,6 +4,10 @@ FROM python:3.10-slim
4
  # Set working directory
5
  WORKDIR /app
6
 
 
 
 
 
7
  # Copy project files
8
  COPY pyproject.toml .
9
  COPY README.md .
@@ -15,8 +19,8 @@ COPY __init__.py .
15
  COPY client.py .
16
  COPY server ./server
17
 
18
- # Install dependencies (no-cache to save space)
19
- RUN pip install --no-cache-dir .
20
 
21
  # Expose the standard OpenEnv port
22
  EXPOSE 8000
 
4
  # Set working directory
5
  WORKDIR /app
6
 
7
+ # Install dependencies first to maximize Docker cache reuse across code edits
8
+ COPY requirements.txt .
9
+ RUN pip install -r requirements.txt
10
+
11
  # Copy project files
12
  COPY pyproject.toml .
13
  COPY README.md .
 
19
  COPY client.py .
20
  COPY server ./server
21
 
22
+ # Install local package metadata/entrypoints without reinstalling dependencies
23
+ RUN pip install --no-deps .
24
 
25
  # Expose the standard OpenEnv port
26
  EXPOSE 8000
WEB_README.md CHANGED
@@ -1,107 +1,70 @@
1
  # Cloud DevOps RLEnv
2
 
3
- Cloud DevOps RLEnv is an OpenEnv-compatible environment for training and evaluating agents on realistic cloud SRE and DevOps incident-response tasks.
4
 
5
- ## Environment Description And Motivation
6
 
7
- Production incidents are often multi-step: triage, inspect resources, check logs, apply a safe remediation, and then verify the fix. This environment simulates that loop with deterministic scenarios and shaped rewards.
 
 
 
 
8
 
9
- Goals:
10
- - Benchmark planning and tool-use behavior for cloud operations agents.
11
- - Reward correct diagnosis over blind action execution.
12
- - Provide repeatable task outcomes for fair grading and comparison.
13
 
14
- ## Action Space
 
 
 
 
 
15
 
16
- Action model: CloudAction
17
 
18
- Fields:
19
- - command (required): one of list_resources, describe_resource, view_logs, update_security_group, restart_service, submit_solution.
20
- - resource_id (optional): target resource identifier (required for most non-list actions).
21
- - parameters (optional): structured key/value arguments used by mutating actions.
 
22
 
23
- Notes:
24
- - update_security_group expects parameters.port and usually parameters.action.
25
- - restart_service targets a single instance by resource_id.
26
 
27
- ## Observation And State Space
 
 
28
 
29
- Observation model: CloudObservation
30
 
31
- Primary observation fields:
32
- - output: command result payload.
33
- - error: command error, when present.
34
- - system_health_status: CRITICAL, DEGRADED, or HEALTHY.
35
- - done: terminal flag.
36
- - reward: scalar step reward.
37
- - metadata: includes task name, resolution status, step count, and other diagnostics.
38
 
39
- Hidden state model: CloudState
40
- - task_difficulty: easy, medium, or hard.
41
- - resources: underlying resource graph and logs.
42
- - step_count: total actions issued.
43
- - is_resolved: whether incident root cause is remediated.
44
 
45
- ## Task Definitions And Expected Difficulty
46
-
47
- - easy:
48
- Open port 80 on sg-web so web traffic can flow.
49
- Expected difficulty: low.
50
- - medium:
51
- Inspect API logs to identify DB connectivity failure, then open port 5432 on sg-db.
52
- Expected difficulty: medium (requires diagnosis before remediation).
53
- - hard:
54
- Trace load balancer timeout to i-web2, inspect the target, then restart the correct service.
55
- Expected difficulty: high (multi-hop diagnosis and anti-shortcut checks).
56
-
57
- ## Setup And Usage
58
-
59
- From repository root:
60
-
61
- - Validate OpenEnv package structure and manifest:
62
- ..\\.venv\\Scripts\\openenv validate
63
- - Run pre-submission validator (skip live inference):
64
- bash scripts/pre_submit_validate.sh --skip-inference
65
- - Build local submission image:
66
- docker build -t cloud-devops-env:phase1 -f Dockerfile .
67
-
68
- Optional local server run:
69
 
 
70
  uvicorn server.app:app --host 0.0.0.0 --port 8000
 
71
 
72
- ## Inference Contract
73
-
74
- inference.py uses the OpenAI client and reads:
75
- - API_BASE_URL
76
- - MODEL_NAME
77
- - HF_TOKEN
78
-
79
- It emits strict structured logs:
80
- - [START] { ... } per task
81
- - [STEP] { ... } per environment action
82
- - [END] { ... } per task summary
83
-
84
- ## Baseline Scores
85
 
86
- Representative deterministic scripted-policy targets:
 
 
 
87
 
88
- - easy: 1.0
89
- - medium: 0.8-1.0
90
- - hard: 1.0
 
91
 
92
- Validation expectation:
93
- - Aggregate scores are clamped to [0.0, 1.0].
94
- - SUCCESS_SCORE_THRESHOLD for inference summaries is 0.8.
95
 
96
- ## Hugging Face Space Deployment
 
 
97
 
98
- 1. Push this repository to your Space (Docker SDK).
99
- 2. Keep README.md front matter for Space metadata.
100
- 3. Set Space secrets/variables:
101
- - HF_TOKEN (secret)
102
- - API_BASE_URL (for example https://router.huggingface.co/v1)
103
- - MODEL_NAME (chosen model slug)
104
- 4. Wait for Space build to complete.
105
- 5. Verify endpoints:
106
- - GET /health returns 200
107
- - POST /reset returns 200
 
1
  # Cloud DevOps RLEnv
2
 
3
+ This environment trains and tests agents on cloud incident response.
4
 
5
+ ## What You Need To Do
6
 
7
+ Solve incidents by following the same workflow a real SRE would use:
8
+ 1. Inspect resources.
9
+ 2. Read logs.
10
+ 3. Apply a safe fix.
11
+ 4. Submit the solution.
12
 
13
+ ## Available Actions
 
 
 
14
 
15
+ - `list_resources`: See all resources.
16
+ - `describe_resource`: View one resource.
17
+ - `view_logs`: Read logs for one resource.
18
+ - `update_security_group`: Add/modify security rules.
19
+ - `restart_service`: Restart an instance.
20
+ - `submit_solution`: Submit your final answer.
21
 
22
+ ## What You Receive Each Step
23
 
24
+ - `output`: Main command result.
25
+ - `error`: Error text if a command fails.
26
+ - `system_health_status`: `CRITICAL`, `DEGRADED`, or `HEALTHY`.
27
+ - `reward`: Step reward.
28
+ - `done`: Whether the episode has ended.
29
 
30
+ ## Difficulty Levels
 
 
31
 
32
+ - `easy`: Open port `80` on `sg-web`.
33
+ - `medium`: Find DB timeout in logs, then open port `5432` on `sg-db`.
34
+ - `hard`: Trace timeout through load balancer to `i-web2`, then restart the correct service.
35
 
36
+ ## Quick Start
37
 
38
+ Run from repo root:
 
 
 
 
 
 
39
 
40
+ ```bash
41
+ ..\\.venv\\Scripts\\openenv validate
42
+ bash scripts/pre_submit_validate.sh --skip-inference
43
+ docker build -t cloud-devops-env:phase1 -f Dockerfile .
44
+ ```
45
 
46
+ Run server locally:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ ```bash
49
  uvicorn server.app:app --host 0.0.0.0 --port 8000
50
+ ```
51
 
52
+ ## Inference Requirements
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ `inference.py` reads:
55
+ - `API_BASE_URL`
56
+ - `MODEL_NAME`
57
+ - `HF_TOKEN`
58
 
59
+ It logs strict markers:
60
+ - `[START]`
61
+ - `[STEP]`
62
+ - `[END]`
63
 
64
+ ## Baseline Score Targets
 
 
65
 
66
+ - easy: `1.0`
67
+ - medium: `0.8` to `1.0`
68
+ - hard: `1.0`
69
 
70
+ Scores are clamped to `[0.0, 1.0]`.
 
 
 
 
 
 
 
 
 
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.2
2
+ pydantic>=2.0.0
3
+ openai>=1.0.0
4
+ fastapi>=0.115.0
5
+ uvicorn>=0.24.0
server/app.py CHANGED
@@ -36,9 +36,9 @@ from pathlib import Path
36
  os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
37
  _BASE_DIR = Path(__file__).resolve().parent.parent
38
  _WEB_README = _BASE_DIR / "WEB_README.md"
39
- os.environ.setdefault(
40
- "ENV_README_PATH",
41
- str(_WEB_README if _WEB_README.exists() else (_BASE_DIR / "README.md")),
42
  )
43
 
44
  try:
 
36
  os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
37
  _BASE_DIR = Path(__file__).resolve().parent.parent
38
  _WEB_README = _BASE_DIR / "WEB_README.md"
39
+ # Force a clean renderer source file for /web; fallback only if missing.
40
+ os.environ["ENV_README_PATH"] = str(
41
+ _WEB_README if _WEB_README.exists() else (_BASE_DIR / "README.md")
42
  )
43
 
44
  try: