Spaces:

SidhaGarg
/

Cloud-DevOps-RLEnv

Sleeping

App Files Files Community

SidhaGarg commited on Apr 7

Commit

222ebaa

1 Parent(s): 9075e6a

Fix /web README source and speed up Docker rebuilds

Browse files

Files changed (4) hide show

Dockerfile +6 -2
WEB_README.md +48 -85
requirements.txt +5 -0
server/app.py +3 -3

Dockerfile CHANGED Viewed

@@ -4,6 +4,10 @@ FROM python:3.10-slim
 # Set working directory
 WORKDIR /app
 # Copy project files
 COPY pyproject.toml .
 COPY README.md .
@@ -15,8 +19,8 @@ COPY __init__.py .
 COPY client.py .
 COPY server ./server
-# Install dependencies (no-cache to save space)
-RUN pip install --no-cache-dir .
 # Expose the standard OpenEnv port
 EXPOSE 8000

 # Set working directory
 WORKDIR /app
+# Install dependencies first to maximize Docker cache reuse across code edits
+COPY requirements.txt .
+RUN pip install -r requirements.txt
 # Copy project files
 COPY pyproject.toml .
 COPY README.md .
 COPY client.py .
 COPY server ./server
+# Install local package metadata/entrypoints without reinstalling dependencies
+RUN pip install --no-deps .
 # Expose the standard OpenEnv port
 EXPOSE 8000

WEB_README.md CHANGED Viewed

@@ -1,107 +1,70 @@
 # Cloud DevOps RLEnv
-Cloud DevOps RLEnv is an OpenEnv-compatible environment for training and evaluating agents on realistic cloud SRE and DevOps incident-response tasks.
-## Environment Description And Motivation
-Production incidents are often multi-step: triage, inspect resources, check logs, apply a safe remediation, and then verify the fix. This environment simulates that loop with deterministic scenarios and shaped rewards.
-Goals:
-- Benchmark planning and tool-use behavior for cloud operations agents.
-- Reward correct diagnosis over blind action execution.
-- Provide repeatable task outcomes for fair grading and comparison.
-## Action Space
-Action model: CloudAction
-Fields:
-- command (required): one of list_resources, describe_resource, view_logs, update_security_group, restart_service, submit_solution.
-- resource_id (optional): target resource identifier (required for most non-list actions).
-- parameters (optional): structured key/value arguments used by mutating actions.
-Notes:
-- update_security_group expects parameters.port and usually parameters.action.
-- restart_service targets a single instance by resource_id.
-## Observation And State Space
-Observation model: CloudObservation
-Primary observation fields:
-- output: command result payload.
-- error: command error, when present.
-- system_health_status: CRITICAL, DEGRADED, or HEALTHY.
-- done: terminal flag.
-- reward: scalar step reward.
-- metadata: includes task name, resolution status, step count, and other diagnostics.
-Hidden state model: CloudState
-- task_difficulty: easy, medium, or hard.
-- resources: underlying resource graph and logs.
-- step_count: total actions issued.
-- is_resolved: whether incident root cause is remediated.
-## Task Definitions And Expected Difficulty
-- easy:
-  Open port 80 on sg-web so web traffic can flow.
-  Expected difficulty: low.
-- medium:
-  Inspect API logs to identify DB connectivity failure, then open port 5432 on sg-db.
-  Expected difficulty: medium (requires diagnosis before remediation).
-- hard:
-  Trace load balancer timeout to i-web2, inspect the target, then restart the correct service.
-  Expected difficulty: high (multi-hop diagnosis and anti-shortcut checks).
-## Setup And Usage
-From repository root:
-- Validate OpenEnv package structure and manifest:
-  ..\\.venv\\Scripts\\openenv validate
-- Run pre-submission validator (skip live inference):
-  bash scripts/pre_submit_validate.sh --skip-inference
-- Build local submission image:
-  docker build -t cloud-devops-env:phase1 -f Dockerfile .
-Optional local server run:
 uvicorn server.app:app --host 0.0.0.0 --port 8000
-## Inference Contract
-inference.py uses the OpenAI client and reads:
-- API_BASE_URL
-- MODEL_NAME
-- HF_TOKEN
-It emits strict structured logs:
-- [START] { ... } per task
-- [STEP] { ... } per environment action
-- [END] { ... } per task summary
-## Baseline Scores
-Representative deterministic scripted-policy targets:
-- easy: 1.0
-- medium: 0.8-1.0
-- hard: 1.0
-Validation expectation:
-- Aggregate scores are clamped to [0.0, 1.0].
-- SUCCESS_SCORE_THRESHOLD for inference summaries is 0.8.
-## Hugging Face Space Deployment
-1. Push this repository to your Space (Docker SDK).
-2. Keep README.md front matter for Space metadata.
-3. Set Space secrets/variables:
-   - HF_TOKEN (secret)
-   - API_BASE_URL (for example https://router.huggingface.co/v1)
-   - MODEL_NAME (chosen model slug)
-4. Wait for Space build to complete.
-5. Verify endpoints:
-   - GET /health returns 200
-   - POST /reset returns 200

 # Cloud DevOps RLEnv
+This environment trains and tests agents on cloud incident response.
+## What You Need To Do
+Solve incidents by following the same workflow a real SRE would use:
+1. Inspect resources.
+2. Read logs.
+3. Apply a safe fix.
+4. Submit the solution.
+## Available Actions
+- `list_resources`: See all resources.
+- `describe_resource`: View one resource.
+- `view_logs`: Read logs for one resource.
+- `update_security_group`: Add/modify security rules.
+- `restart_service`: Restart an instance.
+- `submit_solution`: Submit your final answer.
+## What You Receive Each Step
+- `output`: Main command result.
+- `error`: Error text if a command fails.
+- `system_health_status`: `CRITICAL`, `DEGRADED`, or `HEALTHY`.
+- `reward`: Step reward.
+- `done`: Whether the episode has ended.
+## Difficulty Levels
+- `easy`: Open port `80` on `sg-web`.
+- `medium`: Find DB timeout in logs, then open port `5432` on `sg-db`.
+- `hard`: Trace timeout through load balancer to `i-web2`, then restart the correct service.
+## Quick Start
+Run from repo root:
+```bash
+..\\.venv\\Scripts\\openenv validate
+bash scripts/pre_submit_validate.sh --skip-inference
+docker build -t cloud-devops-env:phase1 -f Dockerfile .
+```
+Run server locally:
+```bash
 uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+## Inference Requirements
+`inference.py` reads:
+- `API_BASE_URL`
+- `MODEL_NAME`
+- `HF_TOKEN`
+It logs strict markers:
+- `[START]`
+- `[STEP]`
+- `[END]`
+## Baseline Score Targets
+- easy: `1.0`
+- medium: `0.8` to `1.0`
+- hard: `1.0`
+Scores are clamped to `[0.0, 1.0]`.

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+openenv-core[core]>=0.2.2
+pydantic>=2.0.0
+openai>=1.0.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

server/app.py CHANGED Viewed

@@ -36,9 +36,9 @@ from pathlib import Path
 os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
 _BASE_DIR = Path(__file__).resolve().parent.parent
 _WEB_README = _BASE_DIR / "WEB_README.md"
-os.environ.setdefault(
-    "ENV_README_PATH",
-    str(_WEB_README if _WEB_README.exists() else (_BASE_DIR / "README.md")),
 )
 try:

 os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
 _BASE_DIR = Path(__file__).resolve().parent.parent
 _WEB_README = _BASE_DIR / "WEB_README.md"
+# Force a clean renderer source file for /web; fallback only if missing.
+os.environ["ENV_README_PATH"] = str(
+    _WEB_README if _WEB_README.exists() else (_BASE_DIR / "README.md")
 )
 try: