574e6d207b
- systemd/sdp-control-plane.service: plain host process on 186, listens on :3452, data dir ~/SDP/data. MemoryMax=512M, Restart=always, ReadWritePaths scoped to the data dir. - systemd/sdp-agent-micro.service: plain host process on 92, default SDP_CP_URL=ws://172.18.139.186:3452/ws/agent. Operator can drop /etc/default/sdp-agent-micro to override. Depends on docker.service so the dockerd is up before the agent starts. - systemd/sdp-agent-gateway.service: plain host process on 186, default SDP_CP_URL=ws://127.0.0.1:3452/ws/agent (loopback since both live on the same VM). Same env-file override pattern. - All three use Type=simple, Restart=always, RestartSec=2s. The agents already reconnect on transient network drops, so restart-on-crash is the right policy. - The agents talk to the host dockerd via /var/run/docker.sock to spawn the actual service containers (sdp-<repo>). Service containers are managed by docker, not systemd — only the long-running agents and the control plane are under systemd. - scripts/deploy.sh: now a one-shot — scp's binaries, dashboard, and unit files; systemctl daemon-reload + enable --now + restart each service in the right order (control plane first on 186 so the gateway agent has something to dial). Prints status + last 10 journal lines per service so the user can see it came up. - AGENTS.md, README.md: layout tree updated, deploy section rewritten, the systemd units documented alongside the agents and control plane.
160 lines
7.1 KiB
Markdown
160 lines
7.1 KiB
Markdown
# AGENTS.md — Sandbox Deployment Platform
|
|
|
|
## Build, lint, test
|
|
|
|
The build script is the only way to compile — local Go can't fetch the
|
|
1.24 toolchain. Run:
|
|
|
|
```
|
|
./scripts/build.sh # cross-compiles 3 Go binaries + builds the Next.js dashboard
|
|
./scripts/deploy.sh # SSHes artifacts + systemd units to 92 and 186, then enables+starts them; needs sshpass
|
|
```
|
|
|
|
The script uses a `golang:1.24-alpine` container with a persistent
|
|
`sdp-gocache` named volume. `GO_IMAGE=...` overrides the image. Outputs:
|
|
`bin/{control-plane,agent-micro,agent-gateway}` (Linux/amd64, static) and
|
|
`dashboard/out/`.
|
|
|
|
Per-module Go work uses the same container:
|
|
|
|
```
|
|
docker run --rm -v "$PWD:/src" -w /src/<module> golang:1.24-alpine sh -c \
|
|
"apk add --no-cache git >/dev/null && git config --global --add safe.directory /src && go vet ./..."
|
|
```
|
|
|
|
For a single test:
|
|
|
|
```
|
|
docker run --rm -v "$PWD:/src" -w /src/control-plane golang:1.24-alpine sh -c \
|
|
"apk add --no-cache git >/dev/null && git config --global --add safe.directory /src && go test ./internal/store/..."
|
|
```
|
|
|
|
There is one test file today: `control-plane/internal/store/store_test.go`
|
|
(round-trips all Slice-2 CRUD).
|
|
|
|
The dashboard has no separate typecheck or lint script — `npm run build`
|
|
runs both. `cd dashboard && npm run build` locally is fine; node_modules
|
|
is gitignored.
|
|
|
|
## Layout
|
|
|
|
Five Go modules in a workspace (`go.work`):
|
|
|
|
- `protocol/` — wire types shared by CP and agents. Keep small.
|
|
- `agentlib/` — `gitutil` (askpass-via-stdin credential helper;
|
|
`git ls-remote`, `fetch`, `checkout`, `pull`, `for-each-ref`,
|
|
`reset --hard`) and `deployer` (per-deployment state machine; `NewGo`
|
|
for microservices, `NewPHP` for erangel).
|
|
- `control-plane/` — HTTP API + WS hub + SQLite. Routes split across
|
|
`internal/api/{login,sandboxes,templates,environments,routes,deployments,repos}.go`.
|
|
`internal/ws/hub.go` exposes `CallAgent` for sync RPCs.
|
|
- `agent-micro/` — runs on 172.18.136.92.
|
|
- `agent-gateway/` — runs on 172.18.139.186; owns erangel at
|
|
`/var/www/html/erangel-ocean` and the `<service>_url` patching.
|
|
- `systemd/` — unit files for the three long-running services
|
|
(`sdp-control-plane.service`, `sdp-agent-micro.service`,
|
|
`sdp-agent-gateway.service`). All three are plain host processes
|
|
managed by systemd; the agents talk to the host's dockerd via
|
|
`/var/run/docker.sock` to spawn the actual service containers
|
|
(`sdp-<repo>`) for each deploy. Service containers are NOT
|
|
managed by systemd — that's docker's job.
|
|
|
|
Dashboard is a separate `next build` static export under
|
|
`dashboard/src/app/`. Static export means dynamic routes need
|
|
`generateStaticParams` (see the `sandboxes/[id]` page for the pattern).
|
|
|
|
## Wire protocol
|
|
|
|
The agent → control-plane channel is one `protocol.Event` per WS text
|
|
message. The control-plane → agent channel is an ad-hoc envelope
|
|
`{op, id, data}`. `op` values: `deploy`, `stop`, `list_repos`,
|
|
`list_branches`, `list_routes`, `probe`, `push_routes`. RPC replies have
|
|
`{op:"reply", id, ok, error?}` and a `data` field. The two shapes are
|
|
disambiguated by `kind` (event) vs `op` (rpc reply). New ops go in
|
|
`agentlib/.../main.go`'s switch and the control-plane's `repos.go` /
|
|
`sandboxes.go` / `routes.go` handlers — there is no central registry.
|
|
|
|
## Conventions
|
|
|
|
- `ponytail:` comments mark intentional shortcuts and "TODO: real
|
|
impl"-style carve-outs. They survive into main. Don't remove without
|
|
fixing the underlying limitation.
|
|
- Slice-2 stable container name: `sdp-<repo>` (no deployment id). The
|
|
next deploy force-removes the existing one. One live container per
|
|
repo at a time.
|
|
- Gateway agent persists the per-branch OCP-default snapshot to
|
|
`<repoPath>/.sdp/ocp-defaults.json`. Re-captured on every deploy so
|
|
branch switches don't break "Restore OCP" buttons.
|
|
- `NewPHP` runs `git reset --hard` before fetch (via
|
|
`Spec.PreGitReset`), and the agent passes an `AfterStart` closure
|
|
that re-applies active route overrides after the container is up.
|
|
This is what survives `git reset --hard` + checkout.
|
|
- `protocol.Event.ContainerID` is set on the deployer side; the
|
|
deployer writes it back via `Store.SetContainerID`. (Currently the
|
|
field on the event is unused; container id is recorded in SQLite.)
|
|
- Cookie auth: `sdp_session` HttpOnly cookie; the `withAuth` middleware
|
|
skips `/api/login`. WebSocket endpoints are NOT auth-gated by the
|
|
middleware — they rely on the agent being on a private network.
|
|
- Crendentials travel with each deploy/probe/push_routes frame from
|
|
control plane to agent. Never logged. Never persisted on the agent.
|
|
|
|
## Gotchas
|
|
|
|
- Host Go (`/usr/bin/go`) is older than the `go 1.24` modules require
|
|
and the toolchain download is blocked. Use the `golang:1.24-alpine`
|
|
container. Do not edit code expecting `go build` to work locally.
|
|
- The micro agent and gateway agent `main.go` files duplicate most
|
|
logic (dial / writer / readLoop / runDeploy). The shared code is in
|
|
`agentlib/`. When adding a new op, both files need a switch case.
|
|
- `moby/moby/client` v0.5.0 uses `netip.Addr` for `PortBinding.HostIP`,
|
|
not a string.
|
|
- `sdp-<repo>` containers must be in a state where `docker rm -f` works
|
|
(the `Slice-2` "one live per repo" rule). Don't manually `docker run`
|
|
a second container with the same name.
|
|
- The erangel repo path is `/var/www/html/erangel-ocean` on 186, NOT
|
|
`~/SDP` (README's earlier value is wrong; the spec was fixed in
|
|
Slice 2). `APACHE_DOCUMENT_ROOT` is set to the same path so the
|
|
gateway is served at `/erangel/`.
|
|
- `agent-gateway/.../main.go` re-imports the `routesState` type and
|
|
uses `rs` as both a value and a parameter name in some helpers.
|
|
Compiles fine; just be aware when grepping.
|
|
- Static-export dynamic routes: `generateStaticParams` must return at
|
|
least one placeholder; the actual id is read at runtime in the
|
|
client component. See `dashboard/src/app/dashboard/sandboxes/[id]/`.
|
|
|
|
## Verifying changes locally
|
|
|
|
```
|
|
# Typecheck + build everything
|
|
./scripts/build.sh
|
|
|
|
# Run the only Go test
|
|
docker run --rm -v "$PWD:/src" -w /src/control-plane golang:1.24-alpine sh -c \
|
|
"apk add --no-cache git >/dev/null && git config --global --add safe.directory /src && go test ./..."
|
|
|
|
# Smoke the control plane
|
|
./bin/control-plane -addr :3452 -data /tmp/sdp-data &
|
|
curl -i -X POST http://127.0.0.1:3452/api/login -d '{"username":"x","password":"y"}'
|
|
# Expects 401 ("login failed — git ls-remote rejected") when no gateway agent is connected.
|
|
```
|
|
|
|
## Out of scope
|
|
|
|
RBAC, suspend/resume, sandbox cloning beyond "clone template into
|
|
sandbox", per-sandbox Docker networks, per-sandbox resource limits,
|
|
health monitoring, the 172.18.136.93 infra agent, notifications.
|
|
These are listed as `later` in REQUIREMENTS.md.
|
|
|
|
## Do not
|
|
|
|
- Do not commit or push unless the user explicitly says "commit" or
|
|
"push".
|
|
- Do not change the gateway repo path back to `~/SDP` (old docs say
|
|
so; reality is `/var/www/html/erangel-ocean`).
|
|
- Do not rebuild the dashboard via `next start` for production; the
|
|
output is served by nginx on 186. Configure nginx by hand; the
|
|
reference config is in `nginx/nginx.conf` and uses
|
|
`root /home/administrator/SDP/dashboard;` (i.e. the path
|
|
`deploy.sh` scp's the static export to).
|
|
- Do not log or persist Bitbucket creds anywhere.
|