Files
bri-sandbox-development-pla…/README.md
T
Achmad 574e6d207b Slice 2: agents and control plane run under systemd
- systemd/sdp-control-plane.service: plain host process on 186,
  listens on :3452, data dir ~/SDP/data. MemoryMax=512M,
  Restart=always, ReadWritePaths scoped to the data dir.
- systemd/sdp-agent-micro.service: plain host process on 92,
  default SDP_CP_URL=ws://172.18.139.186:3452/ws/agent. Operator
  can drop /etc/default/sdp-agent-micro to override. Depends on
  docker.service so the dockerd is up before the agent starts.
- systemd/sdp-agent-gateway.service: plain host process on 186,
  default SDP_CP_URL=ws://127.0.0.1:3452/ws/agent (loopback since
  both live on the same VM). Same env-file override pattern.
- All three use Type=simple, Restart=always, RestartSec=2s. The
  agents already reconnect on transient network drops, so
  restart-on-crash is the right policy.
- The agents talk to the host dockerd via /var/run/docker.sock to
  spawn the actual service containers (sdp-<repo>). Service
  containers are managed by docker, not systemd — only the
  long-running agents and the control plane are under systemd.
- scripts/deploy.sh: now a one-shot — scp's binaries, dashboard,
  and unit files; systemctl daemon-reload + enable --now + restart
  each service in the right order (control plane first on 186 so
  the gateway agent has something to dial). Prints status + last
  10 journal lines per service so the user can see it came up.
- AGENTS.md, README.md: layout tree updated, deploy section
  rewritten, the systemd units documented alongside the agents
  and control plane.
2026-06-24 04:54:28 +00:00

205 lines
8.4 KiB
Markdown

# Sandbox Deployment Platform (SDP)
Internal deployment platform for Backend/QA. Lets a developer deploy a feature
branch into an isolated sandbox, with the API Gateway routing selected
services to the sandbox and the rest to OCP. See
[REQUIREMENTS.md](REQUIREMENTS.md) for the full spec.
## Status (Slice 2 — sandboxes, routes, real auth, all MVP features)
`./scripts/build.sh` produces three Linux/amd64 binaries and a static
dashboard. The full MVP flow works end to end:
- Real Bitbucket auth via `git ls-remote` against the api-gateway.
- Real repo and branch listing via agent WS frames.
- Sandbox / template / environment CRUD with persisted metadata in
SQLite.
- Route overrides per sandbox, with live read-back of the
`<service>_url` map from the gateway's `config.php` after every
branch switch. The agent patches the file and gracefully reloads
apache.
- Per-deploy port binding: the user picks the host port per service
(e.g. eredar at `172.18.136.92:9001`), the container's exposed port
is published to that port.
- Erangel deploy: `git reset --hard → fetch → checkout → pull →
composer install → start container → re-apply route overrides`.
Per-branch OCP-default snapshot persisted to
`<repo>/.sdp/ocp-defaults.json`.
See [REQUIREMENTS.md](REQUIREMENTS.md#status) for the per-feature
checklist.
## Layout
```
.
├── protocol/ # shared wire types (Event, DeployRequest, RouteOverride, ...)
├── agentlib/ # Go. Shared agent library: gitutil + deployer (Go/PHP flavours)
├── control-plane/ # Go. HTTP API + WS hub + SQLite/.log persistence
├── agent-micro/ # Go. Runs on 172.18.136.92, deploys Go microservices
├── agent-gateway/ # Go. Runs on 172.18.139.186, deploys the PHP API Gateway
├── dashboard/ # NextJS static export, served by nginx
├── nginx/ # reference nginx config (manually applied on 186)
├── scripts/ # build, deploy, ssh wrappers
├── docker-compose.yml # all three services on alpine:latest
├── systemd/ # unit files for the three long-running services
├── go.work # Go workspace — one build, five modules
└── bin/ # built binaries (tracked, see .gitignore comment)
```
`agentlib/` is a shared library used by both agents. It owns the git
helpers and the per-deployment state machine, which has two constructors
for two build flavours:
- **`NewGo`** — for microservices. Runs `go build` on the host, then
`docker run alpine:3.20` with the host repo bind-mounted at `/src` and
the binary as the container command. `alpine:3.20` must be pre-loaded
on the host (see [Offline VMs](#offline-vms)).
- **`NewPHP`** — for the API Gateway (erangel). Runs
`git reset --hard → fetch → checkout → pull → composer install
(best-effort) → docker run php:8.3-apache`, with the repo
bind-mounted at `/var/www/html/erangel-ocean` and
`APACHE_DOCUMENT_ROOT=/var/www/html/erangel-ocean` so the gateway is
served at `/erangel/`, mirroring production. After the container is
up, the agent's `AfterStart` callback re-applies the active route
overrides and reloads apache. `php:8.3-apache` must be pre-loaded on
the host. The agent is written in Go; the thing it deploys is a
PHP project.
## Prerequisites
- Docker (for the build container)
- Node 18+ (for the dashboard)
- `sshpass` (for the deploy scripts: `brew install sshpass`)
No Go install needed locally — `scripts/build.sh` cross-compiles inside
`golang:1.24-alpine`.
## Build
```bash
./scripts/build.sh
```
Outputs:
- `bin/control-plane`, `bin/agent-micro`, `bin/agent-gateway` (Linux/amd64
ELF, statically linked)
- `dashboard/out/` (NextJS static export)
The build script:
1. Starts a `golang:1.24-alpine` container with the repo bind-mounted.
2. `apk add git` (the base image has none).
3. Configures `safe.directory /src` so the container's root user can
read the bind-mounted host tree.
4. Cross-compiles all three binaries with `GOOS=linux GOARCH=amd64
CGO_ENABLED=0`, `-trimpath` (reproducible builds) and
`-ldflags="-s -w"` (strip debug info).
5. `chmod +x` the binaries inside the container (the host user can't
chmod files written by the container's root).
6. Builds the Next.js dashboard with `npm install && npm run build`.
The script verifies each binary with `file` to catch a missing
`GOOS`/`GOARCH`.
## Deploy
```bash
./scripts/deploy.sh
```
This script:
1. SSHs to **172.18.136.92** (`administrator`) and pushes `bin/agent-micro`
plus `systemd/sdp-agent-micro.service` to the VM, then runs
`systemctl enable --now sdp-agent-micro`.
2. SSHs to **172.18.139.186** (`administrator`) and pushes
`bin/control-plane`, `bin/agent-gateway`, `dashboard/out/`, and the
matching `systemd/*.service` files, then runs
`systemctl enable --now` for both. The control plane is restarted
first so the gateway agent's `-cp` URL has something to dial.
All three long-running services (control plane + both agents) are
plain host processes managed by systemd. The unit files live in
[systemd/](systemd/). Service containers spawned by the agents
(`sdp-<repo>`) are managed by docker, not systemd — the agents talk
to the host's dockerd via `/var/run/docker.sock` to create and
replace them.
Nginx on 186 is configured by hand; the dashboard ends up at
`/home/administrator/SDP/dashboard/`. The required location blocks
are in [nginx/sandbox.conf](nginx/sandbox.conf) (the actual deployment
on 186) and [nginx/nginx.conf](nginx/nginx.conf) (a legacy
root-mount reference).
Override the creds via `SDP_92_PASS` / `SDP_186_PASS` env vars.
## Local dev (docker compose)
For dev on a single host (e.g. a laptop with Docker):
```bash
./scripts/build.sh
docker compose up -d
```
Three services come up on `alpine:latest`:
- `control-plane` → `:3452` (an unusual port to avoid collisions)
- `agent-micro` (connects to control plane, has docker socket + repos mounted)
- `agent-gateway` (same shape)
## Architecture notes
- **Pass-through creds.** Bitbucket credentials travel with each deploy
request from control plane to agent, are used once for `git fetch`/`checkout`/
`pull`, and are never logged or persisted on the agent.
- **No Dockerfile build on the agent.** Each agent does the language
build on the host (Go or composer), then `docker run <base-image>`
with the host repo bind-mounted and the binary / apache as the
container command. The base image must be pre-loaded.
- **Offline VMs.** `alpine:3.20` and `php:8.3-apache` are pre-loaded
via `docker load`. The dashboard is a static export, no runtime
fetches.
- **Persistence.** Deployment progress goes to SQLite
(`<data>/sdp.db`). Log lines go to append-only
`<data>/logs/<deploymentId>.log`. SQLite uses `modernc.org/sqlite`
(pure Go, no cgo) so the control plane binary stays statically
linkable. The driver name is `sqlite` (not `sqlite3`).
- **Docker SDK.** The agents use the official Moby Go SDK at
`github.com/moby/moby/client` v0.5.0.
- **Realtime transport.** WebSocket end-to-end. Agents connect to
`/ws/agent` on the control plane; the dashboard subscribes to
`/ws/deployments/{id}`.
## MVP stubs (intentional, deferred)
These are marked with `ponytail:` comments in the code and are
scheduled for later slices.
- `CheckOrigin` in the WS upgrader — open CORS, intentional for an
internal tool.
- "Drop on backpressure" policy for slow WS subscribers — replace with
flow control or persistent event log if the dashboard ever needs
catch-up replay.
- O(n) log tail scan in `store.TailLogs` — fine for tail use; swap to
a ring buffer if logs get huge.
## Slice 2 dashboard
The dashboard has these pages:
- `/` — login (real git-ls-remote via the gateway agent).
- `/dashboard` — quick deploy (ad-hoc single-service deploy).
- `/dashboard/sandboxes` — list, create, clone-from-template.
- `/dashboard/sandboxes/{id}` — sandbox detail. Live routes from the
gateway's `config.php`, per-route toggle (OCP / sandbox override),
microservice deploys with per-service host port and env.
- `/dashboard/templates` — template CRUD.
- `/dashboard/environments` — env CRUD.
- `/dashboard/history` — deployment history (filterable by sandbox).
## See also
- [REQUIREMENTS.md](REQUIREMENTS.md) — full spec, infra, MVP success criteria,
per-feature status checklist
- [nginx/nginx.conf](nginx/nginx.conf) — reference nginx config
- [docker-compose.yml](docker-compose.yml) — three-service dev stack