Files
bri-sandbox-development-pla…/REQUIREMENTS.md
T
Achmad 4cab047432 Slice 2: port 3452, nginx sandbox mount, AGENTS.md, docs, deploy script cleanup
- control-plane default listen addr is now :3452 (was :8080). An
  unusual port to avoid collisions on the VM.
- agent-micro and agent-gateway default SDP_CP_URL points at
  ws://localhost:3452/ws/agent. docker-compose.yml updates the
  control plane command, host port mapping, and agent -cp URLs.
- nginx/nginx.conf (the legacy root-mount reference) uses
  127.0.0.1:3452 for the upstream. nginx/sandbox.conf is the new
  deployment config: four location blocks for the /sandbox/credit-card
  mount — _next/static serves cached chunks, /api/ and /ws/ proxy
  to 127.0.0.1:3452, /sandbox/credit-card serves the static
  dashboard with try_files for SPA routing.
- scripts/patch-nginx.sh: deleted. The user configures nginx on 186
  by hand. scripts/deploy.sh no longer calls it.
- AGENTS.md: new file. Documents the build/lint/test commands
  (with the golang:1.24-alpine container — local Go can't fetch
  the toolchain), the wire protocol, the Slice-2 conventions
  (sdp-<repo> container naming, snapshot persistence,
  PreGitReset/AfterStart hooks), the repo-path gotcha, and the
  build-artifacts-in-git rationale.
- dashboard/out: now tracked in git, alongside bin/. The dashboard
  static export is scp'd to 186 on deploy; the VMs have no
  internet so they can't regenerate it. .gitignore comment
  explains this and warns against re-ignoring.
- README.md / REQUIREMENTS.md: status updated to 'Slice 2 done',
  per-feature checklist marked. Erangel repo path corrected to
  /var/www/html/erangel-ocean (was wrongly ~/SDP in earlier docs).
2026-06-24 04:00:49 +00:00

1526 lines
22 KiB
Markdown

# Sandbox Deployment Platform (SDP)
## Status (Slice 2 — sandboxes, routes, real auth, all MVP features)
The build is green: `./scripts/build.sh` produces three Linux/amd64
binaries and a static dashboard. The full MVP flow works end to end:
- Real Bitbucket auth via `git ls-remote` against the api-gateway.
- Real repo and branch listing via agent WS frames.
- Sandbox / template / environment CRUD with persisted metadata in
SQLite.
- Route overrides per sandbox, with live read-back of the
`<service>_url` map from the gateway's `config.php` after every
branch switch. The agent patches the file and gracefully reloads
apache.
- Per-deploy port binding: the user picks the host port per service
(e.g. eredar at `172.18.136.92:9001`), the container's exposed port
is published to that port.
- Erangel deploy: `git reset --hard → fetch → checkout → pull →
composer install → start container → re-apply route overrides`.
Per-branch OCP-default snapshot persisted to
`<repo>/.sdp/ocp-defaults.json`.
See [Status checklist](#status-checklist) at the bottom of this
document for a per-feature status.
## Tech Stack (Decided)
- **Dashboard:** NextJS + React + TypeScript + Tailwind. Plain `useState` + single WebSocket hook. No Redux/Zustand. Built as static output, served by nginx with `try_files`.
- **Control Plane:** Go. **SQLite** for both metadata and ephemeral state (deployment progress snapshots, log lines). Append-only `.log` files for log persistence. The infra VM (172.18.136.93) is reserved for a future PostgreSQL/Redis/etc. cutover; the MVP runs on SQLite alone.
- **Agents:** Go. Use the official Docker SDK (`github.com/moby/moby/client` v0.5.0) for container orchestration. Build Go binaries **directly on the host** (`go build -o {name}`) — no Dockerfile-based build step. The PHP gateway agent runs `composer install --no-dev` on the host as a best-effort step, then `docker run php:8.3-apache`.
- **Realtime transport:** WebSocket end-to-end (Agent → Control Plane → Frontend).
- **Auth:** Bitbucket username/password. Validated by a real `git ls-remote`/`fetch` via the Agent. **Credentials are passed on every operation from Control Plane to Agent. Never logged, never persisted on the Agent longer than the operation.**
- **Infra in the spec** = the existing microservice infrastructure (172.18.* VMs, AppGolang, SDP repo), not infrastructure for SDP itself.
## Overview
Sandbox Deployment Platform (SDP) is an internal deployment platform that allows Backend and QA teams to deploy isolated feature branches without requiring deployment to the shared OpenShift (OCP) environment.
The platform is designed specifically for the company's existing architecture:
* Golang microservices
* PHP API Gateway
* Internal VM infrastructure
* Bitbucket repositories
* No internet access on deployment VMs
* Developers only have read access to OCP
The platform is NOT intended to be a generic Kubernetes, OpenShift, or PaaS solution.
---
# Problem Statement
Current workflow:
1. Developer creates feature branch.
2. Deployment to shared environment requires PR approval and merge.
3. CI/CD deploys to shared OCP.
4. Testing affects other teams.
5. Negative-path testing can disrupt shared development.
Required workflow:
1. Developer deploys feature branch directly.
2. Deployment occurs in isolated sandbox infrastructure.
3. API Gateway selectively routes traffic to sandbox services.
4. Remaining services continue using OCP.
5. QA can test independently.
---
# Infrastructure
## Microservices VM
```text
IP Address:
172.18.136.92
Repository Root:
~/AppGolang
```
Example:
```text
~/AppGolang
├── account
├── payment
├── user
├── notification
└── ...
```
All Golang microservices reside here.
---
## Infrastructure VM
```text
IP Address:
172.18.136.93
```
Reserved for future use:
* PostgreSQL
* Redis
* RabbitMQ
* Kafka
Not required for MVP.
---
## API Gateway VM
```text
IP Address:
172.18.139.186
Repository Root:
/var/www/html/erangel-ocean
```
Contains:
```text
/var/www/html/erangel-ocean
```
The API Gateway repository (erangel). The container
`php:8.3-apache` bind-mounts this path at the same path inside the
container and serves the gateway at `/erangel/`, mirroring the
production URL space.
---
# High-Level Architecture
```text
+--------------------------+
| Dashboard |
| NextJS Frontend |
+------------+-------------+
|
v
+--------------------------+
| Control Plane |
| Go (HTTP + WebSocket) |
+------+------------+------+
| |
| WebSocket | WebSocket
| |
v v
+-------------+ +-------------+
| Micro Agent | | Gateway |
| 172.18.136.92 | | Agent |
| | | 172.18.139.186 |
+-------------+ +-------------+
```
---
# Architectural Principles
## Control Plane
The Control Plane:
* Never SSHs into servers
* Never executes build commands
* Never accesses repositories directly
The Control Plane only:
* Stores metadata
* Manages deployments
* Sends commands to agents via WebSocket (`/ws/agent`)
* Receives deployment events (also via the agent's WebSocket)
* Streams logs to the dashboard over WebSocket (`/ws/deployments/{id}`)
---
## Agents
Agents execute all operations locally.
Examples:
```text
git fetch
git checkout
go build
docker build
docker run
```
Agents have direct filesystem access.
---
# Authentication
## Login
Users authenticate using:
```text
Bitbucket Username
Bitbucket Password
```
---
## Validation
Authentication is validated by attempting a Git operation against a known repository.
Example:
```bash
git ls-remote
```
or
```bash
git fetch
```
If Git authentication succeeds:
```text
LOGIN SUCCESS
```
Otherwise:
```text
LOGIN FAILED
```
---
## Git Operations
All Git operations must use the currently authenticated user's credentials.
Examples:
```bash
git fetch
git pull
git checkout
```
Credentials are passed from Control Plane to Agent during deployment execution.
Credentials must never be logged.
---
# Repository Configuration
Repositories are configured manually on each Agent.
No automatic discovery.
Example:
```yaml
repositories:
- name: account
path: /home/user/AppGolang/account
- name: payment
path: /home/user/AppGolang/payment
- name: user
path: /home/user/AppGolang/user
```
Gateway:
```yaml
repositories:
- name: api-gateway
path: /home/user/SDP
```
---
# Core Concepts
## Node
Represents a VM.
Fields:
```text
id
name
ipAddress
type
```
Types:
```text
MICRO
GATEWAY
INFRA
```
---
## Repository
Fields:
```text
id
name
path
nodeId
```
---
## Environment
Equivalent to:
```text
ConfigMap
Secret
```
Contains:
```text
Variables
Secrets
Files
```
Example:
```env
DB_HOST=
DB_PORT=
REDIS_URL=
JWT_SECRET=
```
---
## Deployment
Represents a deployment execution.
Fields:
```text
id
repository
branch
user
status
logs
startedAt
completedAt
```
---
## Sandbox
Represents an isolated testing environment.
Example:
```yaml
sandbox:
QA-LOGIN-ERROR
services:
account:
branch: feature/login-error
payment:
use_ocp: true
user:
use_ocp: true
```
---
## Sandbox Template
A reusable sandbox configuration.
Purpose:
Reduce repetitive setup.
Example:
```yaml
template:
QA-DEFAULT
gateway:
branch: develop
services:
account:
use_ocp: true
payment:
use_ocp: true
user:
use_ocp: true
```
Another example:
```yaml
template:
ACCOUNT-TESTING
gateway:
branch: develop
services:
account:
branch: feature/account
payment:
use_ocp: true
user:
use_ocp: true
```
Users can:
* Create template
* Update template
* Clone template into sandbox
---
# Micro Agent Requirements
Runs on:
```text
172.18.136.92
```
Responsibilities:
```text
List repositories
List branches
Fetch repository updates
Checkout branch
Pull latest changes
Build Go binary
Run container (the runtime image is pre-loaded; no per-deploy build)
Restart container
Stop container
Stream logs
```
---
# Microservice Deployment Process
Given:
```text
Repository: account
Branch: feature/login-error
```
Agent executes:
```bash
git fetch
git checkout feature/login-error
git pull
```
Then on the host:
```bash
go build -o app-account ./...
```
Then runs a container from the pre-loaded base image, with the host
repo bind-mounted at `/src` and the freshly-built binary as the
command:
```bash
docker run -d \
-v /home/user/AppGolang/account:/src \
alpine:3.20 \
/src/app-account
```
No `docker build` is run. The `alpine:3.20` image is loaded on the
host once via `docker load -i alpine-3.20.tar` (see
[Docker Image Distribution](#docker-image-distribution)).
---
# Gateway Agent Requirements
Runs on:
```text
172.18.139.186
```
Responsibilities:
```text
List branches
Fetch repository updates
Checkout branch
Pull latest changes
Run container (best-effort `composer install --no-dev` on the host;
repo is bind-mounted; no per-deploy build)
Deploy container
Restart container
Manage routing (deferred to Slice 2)
Stream logs
```
---
# API Gateway Deployment
The API Gateway must run inside Docker (so we don't depend on the
VM's nginx for routing the gateway itself).
Deployment process:
```bash
git fetch
git checkout
git pull
```
Best-effort (skipped silently if `composer` is missing or no
`composer.json` is present):
```bash
composer install --no-dev --no-interaction --no-progress
```
Then runs a container from the pre-loaded PHP image, with the host
repo bind-mounted at `/app` and Apache as the entrypoint:
```bash
docker run -d \
-v /home/user/SDP:/app \
-p 80:80 \
php:8.3-apache
```
No `docker build` is run. The `php:8.3-apache` image is loaded on
the host once via `docker load -i php-8.3-apache.tar` (see
[Docker Image Distribution](#docker-image-distribution)).
---
# Offline VM Requirements
Deployment VMs have no internet access.
The following cannot be relied upon:
```bash
docker pull
```
---
# Docker Image Distribution
Images must be imported manually.
Example:
On machine with internet:
```bash
docker pull nginx:latest
docker save nginx:latest -o nginx.tar
```
Transfer:
```bash
scp nginx.tar user@172.18.139.186:/tmp
```
Load:
```bash
docker load -i nginx.tar
```
---
# Environment Management
Users must be able to:
```text
Create Environment
Update Environment
Delete Environment
Manage Secrets
Manage Variables
```
Example:
```env
DB_HOST=...
DB_USER=...
DB_PASSWORD=...
```
Environment values are injected during deployment.
---
# Route Override System
Most important feature.
Each route can target either:
```text
Sandbox Deployment
OCP Deployment
```
Example:
```yaml
account:
target: http://172.18.136.92:9001
payment:
target: https://payment-dev.company.com
user:
target: https://user-dev.company.com
```
Result:
```text
account -> sandbox
payment -> OCP
user -> OCP
```
---
# Mobile App Integration
Current mobile app:
```text
https://project-dev-url.domain.com
```
Target:
```text
http://172.18.139.186:{PORT}
```
Example:
```text
http://172.18.139.186:8080
```
QA can point the mobile application directly to the API Gateway sandbox.
No DNS changes required.
---
# Port Management
Gateway Ports:
```text
8080
8081
8082
...
```
Microservice Ports:
```text
9001
9002
9003
...
```
Control Plane must:
* Allocate ports
* Track ports
* Prevent conflicts
---
# Deployment States
The `protocol.Event.State` field carries the lifecycle state of a
deployment. Supported values:
```text
QUEUED // set by the control plane when a deploy is created
RUNNING // all stages completed successfully, container is up
FAILED // a stage errored; the deploy is dead
STOPPED // user-initiated stop
```
In addition, the `Stage` field of a `progress` event carries the
per-stage human label. The exact stages emitted by an agent depend
on the build flavour:
```text
// Micro agent (Go)
git fetch
git checkout
git pull
go build
start container
// Gateway agent (PHP)
git fetch
git checkout
git pull
composer install // best-effort; skipped silently if not available
start container
```
> The high-level state is small (QUEUED / RUNNING / FAILED /
> STOPPED) and per-step progress lives in the `Stage` field. There
> is no per-deploy image build, so no image-related state is
> needed.
---
# Real-Time Progress
Frontend must receive deployment progress in real time.
Example:
```text
✓ Fetch
✓ Checkout
✓ Build
✓ Create Image
✓ Start Container
✓ Running
```
No page refresh.
---
# Real-Time Logs
Frontend must receive logs while deployment is running.
Example:
```text
[FETCH]
Fetching origin...
[FETCH]
Success
[BUILD]
Running go build
[BUILD]
Success
[DEPLOY]
Container started
```
---
# Event Streaming
Agents emit events.
Examples:
```text
FETCH_STARTED
FETCH_COMPLETED
CHECKOUT_STARTED
CHECKOUT_COMPLETED
BUILD_STARTED
BUILD_COMPLETED
DEPLOY_STARTED
DEPLOY_COMPLETED
DEPLOY_FAILED
```
Architecture:
```text
Agent
-> SSE/WebSocket
Control Plane
-> WebSocket
Frontend
```
---
# Dashboard Features
## Authentication
```text
Login
Logout
```
---
## Repository Management
```text
List Repositories
List Branches
```
---
## Deployments
```text
Deploy Branch
Restart Deployment
Stop Deployment
Delete Deployment
```
---
## Deployment Monitoring
```text
View Progress
View Logs
View Status
View History
```
---
## Environment Management
```text
Create Environment
Update Environment
Delete Environment
```
---
## Sandbox Management
```text
Create Sandbox
Update Sandbox
Delete Sandbox
Clone Sandbox
```
---
## Template Management
```text
Create Template
Update Template
Delete Template
Create Sandbox From Template
```
---
## Route Management
```text
Route To Sandbox
Route To OCP
```
---
# Audit Trail
Store:
```text
User
Repository
Branch
Environment
Sandbox
Timestamp
Status
```
Example:
```text
User:
Achmad
Repository:
account
Branch:
feature/login-error
Sandbox:
QA-LOGIN-ERROR
Status:
SUCCESS
```
---
# Technology Stack
## Dashboard
```text
NextJS
React
TypeScript
Tailwind
```
## Control Plane
```text
Go
SQLite (modernc.org/sqlite, pure Go, no cgo)
WebSocket (gorilla/websocket)
```
## Agents
```text
Go
Docker SDK (github.com/moby/moby/client)
WebSocket (gorilla/websocket)
```
---
# Non-Goals
Not intended to replace:
```text
Kubernetes
OpenShift
Rancher
ArgoCD
Coolify
```
Not intended to support:
```text
Multi-Tenant SaaS
Public Cloud
Generic Container Hosting
```
Purpose:
Provide isolated deployment environments for Backend and QA teams.
---
# MVP Success Criteria
A developer can:
1. Login using Bitbucket username and password.
2. Select a repository.
3. Select a branch.
4. Configure environment variables.
5. Deploy API Gateway.
6. Deploy microservices.
7. Watch deployment progress in real time.
8. Watch deployment logs in real time.
9. Create sandboxes.
10. Create sandbox templates.
11. Route selected services to sandbox deployments.
12. Route remaining services to OCP.
13. Point mobile application to:
```text
http://172.18.139.186:{PORT}
```
14. Allow QA to test isolated feature branches without impacting shared OCP environments.
# Future Enhancements
## Sandbox Isolation Strategy
### Goal
Allow multiple developers and QA engineers to run independent sandboxes simultaneously without conflicts.
Example:
```text
Achmad Sandbox
├── account
├── payment
└── gateway
QA Sandbox
├── account
├── payment
└── gateway
```
Both sandboxes must coexist on the same infrastructure.
---
## Container Naming Convention
Containers should follow a predictable naming pattern.
Format:
```text
sandbox-{sandbox-name}-{service-name}
```
Examples:
```text
sandbox-achmad-account
sandbox-achmad-payment
sandbox-achmad-user
sandbox-achmad-gateway
```
```text
sandbox-qa-login-account
sandbox-qa-login-gateway
```
Benefits:
* Easier troubleshooting
* Easier cleanup
* Easier log inspection
* Easier monitoring
---
## Docker Network Per Sandbox
Each sandbox should have its own Docker network.
Format:
```text
sandbox-{sandbox-name}
```
Examples:
```text
sandbox-achmad
sandbox-qa-login
sandbox-regression
```
Container example:
```text
Network:
sandbox-achmad
Containers:
sandbox-achmad-gateway
sandbox-achmad-account
sandbox-achmad-payment
```
Benefits:
* Network isolation
* Service discovery
* No cross-sandbox traffic
* Simpler routing
---
## Internal Service Communication
Services within a sandbox should communicate through Docker DNS.
Example:
Instead of:
```text
http://172.18.136.92:9001
```
Use:
```text
http://sandbox-achmad-account:8080
```
Benefits:
* No dependency on host ports
* Cleaner configuration
* Easier sandbox replication
---
## Sandbox Port Allocation
Gateway containers should expose a unique external port.
Examples:
```text
sandbox-achmad-gateway
→ 172.18.139.186:8080
sandbox-qa-login-gateway
→ 172.18.139.186:8081
sandbox-regression-gateway
→ 172.18.139.186:8082
```
Mobile applications connect only to gateway ports.
---
## Automatic Port Management
Control Plane should automatically:
* Allocate available ports
* Reserve ports
* Release ports when sandbox is deleted
Example database table:
```text
PortAllocation
├── sandboxId
├── serviceName
├── port
└── allocatedAt
```
---
## Sandbox Lifecycle
Future support:
### Suspend Sandbox
Stops all containers while preserving configuration.
Example:
```text
ACTIVE
SUSPENDED
```
Resources freed:
* CPU
* Memory
Configuration preserved.
---
### Resume Sandbox
Restarts previously suspended sandbox.
Example:
```text
SUSPENDED
ACTIVE
```
---
### Sandbox Expiration
Automatic cleanup after inactivity.
Example:
```text
No activity for 14 days
Mark Expired
Stop Containers
Delete After Retention Period
```
Configurable.
---
## Sandbox Cloning
Clone an existing sandbox.
Example:
```text
Source:
Achmad Sandbox
Destination:
QA Sandbox
```
Result:
```text
Same repositories
Same branches
Same environment variables
Same route overrides
```
New ports are allocated automatically.
---
## Sandbox Snapshots
Capture sandbox state.
Stored information:
* Repository versions
* Branches
* Environment variables
* Route overrides
Example:
```text
Snapshot:
QA-Before-Release
```
Allows rollback and recreation later.
---
## Resource Limits
Per sandbox resource controls.
Example:
```text
CPU: 1 Core
Memory: 1 GB
```
Per container:
```text
CPU: 500m
Memory: 512MB
```
Implemented using Docker resource limits.
---
## Health Monitoring
Track sandbox health.
Metrics:
* Container status
* CPU usage
* Memory usage
* Restart count
* Health endpoint status
Dashboard should display:
```text
Healthy
Degraded
Unhealthy
```
---
## Future Infrastructure Agent
Node:
```text
172.18.136.93
```
Responsibilities:
* PostgreSQL restore
* Database cloning
* RabbitMQ management
* Redis management
* Kafka management
Potential use case:
```text
Clone QA Database
Attach To Sandbox
Run Integration Testing
```
---
## Future RBAC
Current MVP:
```text
All authenticated users
```
Future roles:
```text
ADMIN
BACKEND
QA
VIEWER
```
Permissions:
```text
Deploy
Delete Sandbox
Manage Templates
Manage Routes
Manage Environments
```
---
## Future Notifications
Deployment notifications:
```text
Deployment Started
Deployment Succeeded
Deployment Failed
Sandbox Expired
```
Channels:
* Email
* Slack
* Microsoft Teams
---
# Status checklist
Per-feature status. `done` = implemented in Slice 1. `next` =
scheduled for Slice 2. `later` = out of scope for MVP.
## Build / deploy
- `done` `./scripts/build.sh` produces the three Go binaries and the
Next.js dashboard.
- `done` `./scripts/deploy.sh` SSHes the binaries to 92 and 186.
- `done` `docker compose up -d` brings up the three services on
`alpine:latest` for local dev.
## Core deploy flow
- `done` Agent connects to the control plane over WebSocket and stays
connected across reconnects.
- `done` Control plane dispatches a `deploy` frame to the agent with
the per-operation Bitbucket creds.
- `done` Micro agent runs `git fetch → checkout → pull → go build →
docker run` and streams progress and logs back.
- `done` Gateway agent runs `git reset --hard → fetch → checkout →
pull → composer install (best-effort) → docker run → re-apply route
overrides → apache graceful reload` and streams progress and logs
back.
- `done` Dashboard subscribes to a deployment by id over WebSocket
and renders stages + live log tail.
- `done` SQLite persistence for deployment rows, stage transitions,
and append-only log files.
- `done` Real `validateViaAgent` via the agent's `git ls-remote`
frame.
- `done` Real `list_repos` / `list_branches` via agent frames; the
hardcoded fixtures are gone.
- `done` `list_routes` RPC exposes the live `<key>_url` map from
the gateway's `config.php` after every branch switch.
- `done` `GET /api/deployments` reads deployment history from
SQLite (filterable by sandbox).
## Sandbox & routing
- `done` Sandbox CRUD (data model + REST endpoints + dashboard
pages).
- `done` Sandbox template CRUD and "clone template into sandbox".
- `done` Route management (sandbox vs OCP per service) with live
read-back from the gateway's `config.php`.
- `done` Environment CRUD (persisted named envs, not just inline).
- `done` Actual route push to the API Gateway: the gateway agent
rewrites `application/config/production/config.php` and gracefully
reloads apache. A per-branch OCP-default snapshot is captured
automatically and persisted to `<repo>/.sdp/ocp-defaults.json`.
- `done` Per-deploy port binding: the user specifies the host port;
the agent publishes the container's exposed port to it. Concurrency
is "one live container per repo" (the stable name is `sdp-<repo>`).
## Auth
- `done` Real auth via agent-mediated `git ls-remote` against the
api-gateway. Login fails fast if no gateway agent is connected.
- `done` Session cookie + in-memory session store, 12-hour TTL,
logout invalidates the token.
- `later` RBAC roles (admin / backend / qa / viewer).
## Out of scope for MVP (per the "Future Enhancements" section)
- `later` Per-sandbox Docker networks and the
`sandbox-{name}-{service}` container naming.
- `later` Internal service communication via Docker DNS.
- `later` Suspend / resume / expire sandboxes.
- `later` Sandbox cloning and snapshots.
- `later` Per-sandbox resource limits.
- `later` Health monitoring.
- `later` The infra agent (172.18.136.93) for PostgreSQL/Redis/etc.
- `later` Notifications (email / Slack / Teams).