# Sandbox Deployment Platform (SDP) ## Status (Slice 2 — sandboxes, routes, real auth, all MVP features) The build is green: `./scripts/build.sh` produces three Linux/amd64 binaries and a static dashboard. The full MVP flow works end to end: - Real Bitbucket auth via `git ls-remote` against the api-gateway. - Real repo and branch listing via agent WS frames. - Sandbox / template / environment CRUD with persisted metadata in SQLite. - Route overrides per sandbox, with live read-back of the `_url` map from the gateway's `config.php` after every branch switch. The agent patches the file and gracefully reloads apache. - Per-deploy port binding: the user picks the host port per service (e.g. eredar at `172.18.136.92:9001`), the container's exposed port is published to that port. - Erangel deploy: `git reset --hard → fetch → checkout → pull → composer install → start container → re-apply route overrides`. Per-branch OCP-default snapshot persisted to `/.sdp/ocp-defaults.json`. See [Status checklist](#status-checklist) at the bottom of this document for a per-feature status. ## Tech Stack (Decided) - **Dashboard:** NextJS + React + TypeScript + Tailwind. Plain `useState` + single WebSocket hook. No Redux/Zustand. Built as static output, served by nginx with `try_files`. - **Control Plane:** Go. **SQLite** for both metadata and ephemeral state (deployment progress snapshots, log lines). Append-only `.log` files for log persistence. The infra VM (172.18.136.93) is reserved for a future PostgreSQL/Redis/etc. cutover; the MVP runs on SQLite alone. - **Agents:** Go. Use the official Docker SDK (`github.com/moby/moby/client` v0.5.0) for container orchestration. Build Go binaries **directly on the host** (`go build -o {name}`) — no Dockerfile-based build step. The PHP gateway agent runs `composer install --no-dev` on the host as a best-effort step, then `docker run php:8.3-apache`. - **Realtime transport:** WebSocket end-to-end (Agent → Control Plane → Frontend). - **Auth:** Bitbucket username/password. Validated by a real `git ls-remote`/`fetch` via the Agent. **Credentials are passed on every operation from Control Plane to Agent. Never logged, never persisted on the Agent longer than the operation.** - **Infra in the spec** = the existing microservice infrastructure (172.18.* VMs, AppGolang, SDP repo), not infrastructure for SDP itself. ## Overview Sandbox Deployment Platform (SDP) is an internal deployment platform that allows Backend and QA teams to deploy isolated feature branches without requiring deployment to the shared OpenShift (OCP) environment. The platform is designed specifically for the company's existing architecture: * Golang microservices * PHP API Gateway * Internal VM infrastructure * Bitbucket repositories * No internet access on deployment VMs * Developers only have read access to OCP The platform is NOT intended to be a generic Kubernetes, OpenShift, or PaaS solution. --- # Problem Statement Current workflow: 1. Developer creates feature branch. 2. Deployment to shared environment requires PR approval and merge. 3. CI/CD deploys to shared OCP. 4. Testing affects other teams. 5. Negative-path testing can disrupt shared development. Required workflow: 1. Developer deploys feature branch directly. 2. Deployment occurs in isolated sandbox infrastructure. 3. API Gateway selectively routes traffic to sandbox services. 4. Remaining services continue using OCP. 5. QA can test independently. --- # Infrastructure ## Microservices VM ```text IP Address: 172.18.136.92 Repository Root: ~/AppGolang ``` Example: ```text ~/AppGolang ├── account ├── payment ├── user ├── notification └── ... ``` All Golang microservices reside here. --- ## Infrastructure VM ```text IP Address: 172.18.136.93 ``` Reserved for future use: * PostgreSQL * Redis * RabbitMQ * Kafka Not required for MVP. --- ## API Gateway VM ```text IP Address: 172.18.139.186 Repository Root: /var/www/html/erangel-ocean ``` Contains: ```text /var/www/html/erangel-ocean ``` The API Gateway repository (erangel). The container `php:8.3-apache` bind-mounts this path at the same path inside the container and serves the gateway at `/erangel/`, mirroring the production URL space. --- # High-Level Architecture ```text +--------------------------+ | Dashboard | | NextJS Frontend | +------------+-------------+ | v +--------------------------+ | Control Plane | | Go (HTTP + WebSocket) | +------+------------+------+ | | | WebSocket | WebSocket | | v v +-------------+ +-------------+ | Micro Agent | | Gateway | | 172.18.136.92 | | Agent | | | | 172.18.139.186 | +-------------+ +-------------+ ``` --- # Architectural Principles ## Control Plane The Control Plane: * Never SSHs into servers * Never executes build commands * Never accesses repositories directly The Control Plane only: * Stores metadata * Manages deployments * Sends commands to agents via WebSocket (`/ws/agent`) * Receives deployment events (also via the agent's WebSocket) * Streams logs to the dashboard over WebSocket (`/ws/deployments/{id}`) --- ## Agents Agents execute all operations locally. Examples: ```text git fetch git checkout go build docker build docker run ``` Agents have direct filesystem access. --- # Authentication ## Login Users authenticate using: ```text Bitbucket Username Bitbucket Password ``` --- ## Validation Authentication is validated by attempting a Git operation against a known repository. Example: ```bash git ls-remote ``` or ```bash git fetch ``` If Git authentication succeeds: ```text LOGIN SUCCESS ``` Otherwise: ```text LOGIN FAILED ``` --- ## Git Operations All Git operations must use the currently authenticated user's credentials. Examples: ```bash git fetch git pull git checkout ``` Credentials are passed from Control Plane to Agent during deployment execution. Credentials must never be logged. --- # Repository Configuration Repositories are configured manually on each Agent. No automatic discovery. Example: ```yaml repositories: - name: account path: /home/user/AppGolang/account - name: payment path: /home/user/AppGolang/payment - name: user path: /home/user/AppGolang/user ``` Gateway: ```yaml repositories: - name: api-gateway path: /home/user/SDP ``` --- # Core Concepts ## Node Represents a VM. Fields: ```text id name ipAddress type ``` Types: ```text MICRO GATEWAY INFRA ``` --- ## Repository Fields: ```text id name path nodeId ``` --- ## Environment Equivalent to: ```text ConfigMap Secret ``` Contains: ```text Variables Secrets Files ``` Example: ```env DB_HOST= DB_PORT= REDIS_URL= JWT_SECRET= ``` --- ## Deployment Represents a deployment execution. Fields: ```text id repository branch user status logs startedAt completedAt ``` --- ## Sandbox Represents an isolated testing environment. Example: ```yaml sandbox: QA-LOGIN-ERROR services: account: branch: feature/login-error payment: use_ocp: true user: use_ocp: true ``` --- ## Sandbox Template A reusable sandbox configuration. Purpose: Reduce repetitive setup. Example: ```yaml template: QA-DEFAULT gateway: branch: develop services: account: use_ocp: true payment: use_ocp: true user: use_ocp: true ``` Another example: ```yaml template: ACCOUNT-TESTING gateway: branch: develop services: account: branch: feature/account payment: use_ocp: true user: use_ocp: true ``` Users can: * Create template * Update template * Clone template into sandbox --- # Micro Agent Requirements Runs on: ```text 172.18.136.92 ``` Responsibilities: ```text List repositories List branches Fetch repository updates Checkout branch Pull latest changes Build Go binary Run container (the runtime image is pre-loaded; no per-deploy build) Restart container Stop container Stream logs ``` --- # Microservice Deployment Process Given: ```text Repository: account Branch: feature/login-error ``` Agent executes: ```bash git fetch git checkout feature/login-error git pull ``` Then on the host: ```bash go build -o app-account ./... ``` Then runs a container from the pre-loaded base image, with the host repo bind-mounted at `/src` and the freshly-built binary as the command: ```bash docker run -d \ -v /home/user/AppGolang/account:/src \ alpine:3.20 \ /src/app-account ``` No `docker build` is run. The `alpine:3.20` image is loaded on the host once via `docker load -i alpine-3.20.tar` (see [Docker Image Distribution](#docker-image-distribution)). --- # Gateway Agent Requirements Runs on: ```text 172.18.139.186 ``` Responsibilities: ```text List branches Fetch repository updates Checkout branch Pull latest changes Run container (best-effort `composer install --no-dev` on the host; repo is bind-mounted; no per-deploy build) Deploy container Restart container Manage routing (deferred to Slice 2) Stream logs ``` --- # API Gateway Deployment The API Gateway must run inside Docker (so we don't depend on the VM's nginx for routing the gateway itself). Deployment process: ```bash git fetch git checkout git pull ``` Best-effort (skipped silently if `composer` is missing or no `composer.json` is present): ```bash composer install --no-dev --no-interaction --no-progress ``` Then runs a container from the pre-loaded PHP image, with the host repo bind-mounted at `/app` and Apache as the entrypoint: ```bash docker run -d \ -v /home/user/SDP:/app \ -p 80:80 \ php:8.3-apache ``` No `docker build` is run. The `php:8.3-apache` image is loaded on the host once via `docker load -i php-8.3-apache.tar` (see [Docker Image Distribution](#docker-image-distribution)). --- # Offline VM Requirements Deployment VMs have no internet access. The following cannot be relied upon: ```bash docker pull ``` --- # Docker Image Distribution Images must be imported manually. Example: On machine with internet: ```bash docker pull nginx:latest docker save nginx:latest -o nginx.tar ``` Transfer: ```bash scp nginx.tar user@172.18.139.186:/tmp ``` Load: ```bash docker load -i nginx.tar ``` --- # Environment Management Users must be able to: ```text Create Environment Update Environment Delete Environment Manage Secrets Manage Variables ``` Example: ```env DB_HOST=... DB_USER=... DB_PASSWORD=... ``` Environment values are injected during deployment. --- # Route Override System Most important feature. Each route can target either: ```text Sandbox Deployment OCP Deployment ``` Example: ```yaml account: target: http://172.18.136.92:9001 payment: target: https://payment-dev.company.com user: target: https://user-dev.company.com ``` Result: ```text account -> sandbox payment -> OCP user -> OCP ``` --- # Mobile App Integration Current mobile app: ```text https://project-dev-url.domain.com ``` Target: ```text http://172.18.139.186:{PORT} ``` Example: ```text http://172.18.139.186:8080 ``` QA can point the mobile application directly to the API Gateway sandbox. No DNS changes required. --- # Port Management Gateway Ports: ```text 8080 8081 8082 ... ``` Microservice Ports: ```text 9001 9002 9003 ... ``` Control Plane must: * Allocate ports * Track ports * Prevent conflicts --- # Deployment States The `protocol.Event.State` field carries the lifecycle state of a deployment. Supported values: ```text QUEUED // set by the control plane when a deploy is created RUNNING // all stages completed successfully, container is up FAILED // a stage errored; the deploy is dead STOPPED // user-initiated stop ``` In addition, the `Stage` field of a `progress` event carries the per-stage human label. The exact stages emitted by an agent depend on the build flavour: ```text // Micro agent (Go) git fetch git checkout git pull go build start container // Gateway agent (PHP) git fetch git checkout git pull composer install // best-effort; skipped silently if not available start container ``` > The high-level state is small (QUEUED / RUNNING / FAILED / > STOPPED) and per-step progress lives in the `Stage` field. There > is no per-deploy image build, so no image-related state is > needed. --- # Real-Time Progress Frontend must receive deployment progress in real time. Example: ```text ✓ Fetch ✓ Checkout ✓ Build ✓ Create Image ✓ Start Container ✓ Running ``` No page refresh. --- # Real-Time Logs Frontend must receive logs while deployment is running. Example: ```text [FETCH] Fetching origin... [FETCH] Success [BUILD] Running go build [BUILD] Success [DEPLOY] Container started ``` --- # Event Streaming Agents emit events. Examples: ```text FETCH_STARTED FETCH_COMPLETED CHECKOUT_STARTED CHECKOUT_COMPLETED BUILD_STARTED BUILD_COMPLETED DEPLOY_STARTED DEPLOY_COMPLETED DEPLOY_FAILED ``` Architecture: ```text Agent -> SSE/WebSocket Control Plane -> WebSocket Frontend ``` --- # Dashboard Features ## Authentication ```text Login Logout ``` --- ## Repository Management ```text List Repositories List Branches ``` --- ## Deployments ```text Deploy Branch Restart Deployment Stop Deployment Delete Deployment ``` --- ## Deployment Monitoring ```text View Progress View Logs View Status View History ``` --- ## Environment Management ```text Create Environment Update Environment Delete Environment ``` --- ## Sandbox Management ```text Create Sandbox Update Sandbox Delete Sandbox Clone Sandbox ``` --- ## Template Management ```text Create Template Update Template Delete Template Create Sandbox From Template ``` --- ## Route Management ```text Route To Sandbox Route To OCP ``` --- # Audit Trail Store: ```text User Repository Branch Environment Sandbox Timestamp Status ``` Example: ```text User: Achmad Repository: account Branch: feature/login-error Sandbox: QA-LOGIN-ERROR Status: SUCCESS ``` --- # Technology Stack ## Dashboard ```text NextJS React TypeScript Tailwind ``` ## Control Plane ```text Go SQLite (modernc.org/sqlite, pure Go, no cgo) WebSocket (gorilla/websocket) ``` ## Agents ```text Go Docker SDK (github.com/moby/moby/client) WebSocket (gorilla/websocket) ``` --- # Non-Goals Not intended to replace: ```text Kubernetes OpenShift Rancher ArgoCD Coolify ``` Not intended to support: ```text Multi-Tenant SaaS Public Cloud Generic Container Hosting ``` Purpose: Provide isolated deployment environments for Backend and QA teams. --- # MVP Success Criteria A developer can: 1. Login using Bitbucket username and password. 2. Select a repository. 3. Select a branch. 4. Configure environment variables. 5. Deploy API Gateway. 6. Deploy microservices. 7. Watch deployment progress in real time. 8. Watch deployment logs in real time. 9. Create sandboxes. 10. Create sandbox templates. 11. Route selected services to sandbox deployments. 12. Route remaining services to OCP. 13. Point mobile application to: ```text http://172.18.139.186:{PORT} ``` 14. Allow QA to test isolated feature branches without impacting shared OCP environments. # Future Enhancements ## Sandbox Isolation Strategy ### Goal Allow multiple developers and QA engineers to run independent sandboxes simultaneously without conflicts. Example: ```text Achmad Sandbox ├── account ├── payment └── gateway QA Sandbox ├── account ├── payment └── gateway ``` Both sandboxes must coexist on the same infrastructure. --- ## Container Naming Convention Containers should follow a predictable naming pattern. Format: ```text sandbox-{sandbox-name}-{service-name} ``` Examples: ```text sandbox-achmad-account sandbox-achmad-payment sandbox-achmad-user sandbox-achmad-gateway ``` ```text sandbox-qa-login-account sandbox-qa-login-gateway ``` Benefits: * Easier troubleshooting * Easier cleanup * Easier log inspection * Easier monitoring --- ## Docker Network Per Sandbox Each sandbox should have its own Docker network. Format: ```text sandbox-{sandbox-name} ``` Examples: ```text sandbox-achmad sandbox-qa-login sandbox-regression ``` Container example: ```text Network: sandbox-achmad Containers: sandbox-achmad-gateway sandbox-achmad-account sandbox-achmad-payment ``` Benefits: * Network isolation * Service discovery * No cross-sandbox traffic * Simpler routing --- ## Internal Service Communication Services within a sandbox should communicate through Docker DNS. Example: Instead of: ```text http://172.18.136.92:9001 ``` Use: ```text http://sandbox-achmad-account:8080 ``` Benefits: * No dependency on host ports * Cleaner configuration * Easier sandbox replication --- ## Sandbox Port Allocation Gateway containers should expose a unique external port. Examples: ```text sandbox-achmad-gateway → 172.18.139.186:8080 sandbox-qa-login-gateway → 172.18.139.186:8081 sandbox-regression-gateway → 172.18.139.186:8082 ``` Mobile applications connect only to gateway ports. --- ## Automatic Port Management Control Plane should automatically: * Allocate available ports * Reserve ports * Release ports when sandbox is deleted Example database table: ```text PortAllocation ├── sandboxId ├── serviceName ├── port └── allocatedAt ``` --- ## Sandbox Lifecycle Future support: ### Suspend Sandbox Stops all containers while preserving configuration. Example: ```text ACTIVE ↓ SUSPENDED ``` Resources freed: * CPU * Memory Configuration preserved. --- ### Resume Sandbox Restarts previously suspended sandbox. Example: ```text SUSPENDED ↓ ACTIVE ``` --- ### Sandbox Expiration Automatic cleanup after inactivity. Example: ```text No activity for 14 days ↓ Mark Expired ↓ Stop Containers ↓ Delete After Retention Period ``` Configurable. --- ## Sandbox Cloning Clone an existing sandbox. Example: ```text Source: Achmad Sandbox Destination: QA Sandbox ``` Result: ```text Same repositories Same branches Same environment variables Same route overrides ``` New ports are allocated automatically. --- ## Sandbox Snapshots Capture sandbox state. Stored information: * Repository versions * Branches * Environment variables * Route overrides Example: ```text Snapshot: QA-Before-Release ``` Allows rollback and recreation later. --- ## Resource Limits Per sandbox resource controls. Example: ```text CPU: 1 Core Memory: 1 GB ``` Per container: ```text CPU: 500m Memory: 512MB ``` Implemented using Docker resource limits. --- ## Health Monitoring Track sandbox health. Metrics: * Container status * CPU usage * Memory usage * Restart count * Health endpoint status Dashboard should display: ```text Healthy Degraded Unhealthy ``` --- ## Future Infrastructure Agent Node: ```text 172.18.136.93 ``` Responsibilities: * PostgreSQL restore * Database cloning * RabbitMQ management * Redis management * Kafka management Potential use case: ```text Clone QA Database ↓ Attach To Sandbox ↓ Run Integration Testing ``` --- ## Future RBAC Current MVP: ```text All authenticated users ``` Future roles: ```text ADMIN BACKEND QA VIEWER ``` Permissions: ```text Deploy Delete Sandbox Manage Templates Manage Routes Manage Environments ``` --- ## Future Notifications Deployment notifications: ```text Deployment Started Deployment Succeeded Deployment Failed Sandbox Expired ``` Channels: * Email * Slack * Microsoft Teams --- # Status checklist Per-feature status. `done` = implemented in Slice 1. `next` = scheduled for Slice 2. `later` = out of scope for MVP. ## Build / deploy - `done` `./scripts/build.sh` produces the three Go binaries and the Next.js dashboard. - `done` `./scripts/deploy.sh` SSHes the binaries to 92 and 186. - `done` `docker compose up -d` brings up the three services on `alpine:latest` for local dev. ## Core deploy flow - `done` Agent connects to the control plane over WebSocket and stays connected across reconnects. - `done` Control plane dispatches a `deploy` frame to the agent with the per-operation Bitbucket creds. - `done` Micro agent runs `git fetch → checkout → pull → go build → docker run` and streams progress and logs back. - `done` Gateway agent runs `git reset --hard → fetch → checkout → pull → composer install (best-effort) → docker run → re-apply route overrides → apache graceful reload` and streams progress and logs back. - `done` Dashboard subscribes to a deployment by id over WebSocket and renders stages + live log tail. - `done` SQLite persistence for deployment rows, stage transitions, and append-only log files. - `done` Real `validateViaAgent` via the agent's `git ls-remote` frame. - `done` Real `list_repos` / `list_branches` via agent frames; the hardcoded fixtures are gone. - `done` `list_routes` RPC exposes the live `_url` map from the gateway's `config.php` after every branch switch. - `done` `GET /api/deployments` reads deployment history from SQLite (filterable by sandbox). ## Sandbox & routing - `done` Sandbox CRUD (data model + REST endpoints + dashboard pages). - `done` Sandbox template CRUD and "clone template into sandbox". - `done` Route management (sandbox vs OCP per service) with live read-back from the gateway's `config.php`. - `done` Environment CRUD (persisted named envs, not just inline). - `done` Actual route push to the API Gateway: the gateway agent rewrites `application/config/production/config.php` and gracefully reloads apache. A per-branch OCP-default snapshot is captured automatically and persisted to `/.sdp/ocp-defaults.json`. - `done` Per-deploy port binding: the user specifies the host port; the agent publishes the container's exposed port to it. Concurrency is "one live container per repo" (the stable name is `sdp-`). ## Auth - `done` Real auth via agent-mediated `git ls-remote` against the api-gateway. Login fails fast if no gateway agent is connected. - `done` Session cookie + in-memory session store, 12-hour TTL, logout invalidates the token. - `later` RBAC roles (admin / backend / qa / viewer). ## Out of scope for MVP (per the "Future Enhancements" section) - `later` Per-sandbox Docker networks and the `sandbox-{name}-{service}` container naming. - `later` Internal service communication via Docker DNS. - `later` Suspend / resume / expire sandboxes. - `later` Sandbox cloning and snapshots. - `later` Per-sandbox resource limits. - `later` Health monitoring. - `later` The infra agent (172.18.136.93) for PostgreSQL/Redis/etc. - `later` Notifications (email / Slack / Teams).