diff --git a/AGENTS.md b/AGENTS.md index 9547b8c..08856f8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -7,7 +7,7 @@ The build script is the only way to compile — local Go can't fetch the ``` ./scripts/build.sh # cross-compiles 3 Go binaries + builds the Next.js dashboard -./scripts/deploy.sh # SSHes the artifacts to 92 and 186; needs sshpass +./scripts/deploy.sh # SSHes artifacts + systemd units to 92 and 186, then enables+starts them; needs sshpass ``` The script uses a `golang:1.24-alpine` container with a persistent @@ -51,6 +51,13 @@ Five Go modules in a workspace (`go.work`): - `agent-micro/` — runs on 172.18.136.92. - `agent-gateway/` — runs on 172.18.139.186; owns erangel at `/var/www/html/erangel-ocean` and the `_url` patching. +- `systemd/` — unit files for the three long-running services + (`sdp-control-plane.service`, `sdp-agent-micro.service`, + `sdp-agent-gateway.service`). All three are plain host processes + managed by systemd; the agents talk to the host's dockerd via + `/var/run/docker.sock` to spawn the actual service containers + (`sdp-`) for each deploy. Service containers are NOT + managed by systemd — that's docker's job. Dashboard is a separate `next build` static export under `dashboard/src/app/`. Static export means dynamic routes need diff --git a/README.md b/README.md index 69f2f1c..630a8ed 100644 --- a/README.md +++ b/README.md @@ -42,8 +42,9 @@ checklist. ├── nginx/ # reference nginx config (manually applied on 186) ├── scripts/ # build, deploy, ssh wrappers ├── docker-compose.yml # all three services on alpine:latest +├── systemd/ # unit files for the three long-running services ├── go.work # Go workspace — one build, five modules -└── bin/ # build output (gitignored) +└── bin/ # built binaries (tracked, see .gitignore comment) ``` `agentlib/` is a shared library used by both agents. It owns the git @@ -108,14 +109,26 @@ The script verifies each binary with `file` to catch a missing This script: 1. SSHs to **172.18.136.92** (`administrator`) and pushes `bin/agent-micro` - to `~/SDP/bin/` + plus `systemd/sdp-agent-micro.service` to the VM, then runs + `systemctl enable --now sdp-agent-micro`. 2. SSHs to **172.18.139.186** (`administrator`) and pushes - `bin/control-plane`, `bin/agent-gateway`, and `dashboard/out/` to - `~/SDP/` + `bin/control-plane`, `bin/agent-gateway`, `dashboard/out/`, and the + matching `systemd/*.service` files, then runs + `systemctl enable --now` for both. The control plane is restarted + first so the gateway agent's `-cp` URL has something to dial. + +All three long-running services (control plane + both agents) are +plain host processes managed by systemd. The unit files live in +[systemd/](systemd/). Service containers spawned by the agents +(`sdp-`) are managed by docker, not systemd — the agents talk +to the host's dockerd via `/var/run/docker.sock` to create and +replace them. Nginx on 186 is configured by hand; the dashboard ends up at -`/home/administrator/SDP/dashboard/`. The required location block is -in [nginx/nginx.conf](nginx/nginx.conf). +`/home/administrator/SDP/dashboard/`. The required location blocks +are in [nginx/sandbox.conf](nginx/sandbox.conf) (the actual deployment +on 186) and [nginx/nginx.conf](nginx/nginx.conf) (a legacy +root-mount reference). Override the creds via `SDP_92_PASS` / `SDP_186_PASS` env vars. diff --git a/scripts/deploy.sh b/scripts/deploy.sh index 11440bc..518e445 100755 --- a/scripts/deploy.sh +++ b/scripts/deploy.sh @@ -1,8 +1,10 @@ #!/usr/bin/env bash -# Push the built binaries and dashboard to both SDP VMs. +# Push the built binaries, dashboard, and systemd unit files to both +# SDP VMs, then enable + start the services. # -# 92 (micro): ~/SDP/agent-micro -# 186 (gateway): ~/SDP/control-plane, ~/SDP/agent-gateway, ~/SDP/dashboard +# 92 (micro): ~/SDP/agent-micro, sdp-agent-micro.service +# 186 (gateway): ~/SDP/{control-plane,agent-gateway,dashboard}, +# sdp-control-plane.service, sdp-agent-gateway.service # # Nginx is configured by hand on 186 (out of scope for this script). # Run scripts/build.sh first. @@ -28,15 +30,40 @@ SSH_186="sshpass -p $PASS_186 ssh -o StrictHostKeyChecking=no -o UserKnownHostsF SCP_186="sshpass -p $PASS_186 scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR" # ponytail: Wipe-and-replace. The deploys are stateful on the VM only via -# SQLite + .log files in ~/SDP/data — we keep that. Binaries and the -# dashboard are replaced cleanly. +# SQLite + .log files in ~/SDP/data — we keep that. Binaries, dashboard, +# and unit files in /etc/systemd/system are replaced cleanly. REMOTE_RESET='rm -rf ~/SDP/bin ~/SDP/dashboard && mkdir -p ~/SDP/bin ~/SDP/dashboard' +# install_unit +# stops the old unit (if any), copies the file from /tmp/ (already +# scp'd there), reloads systemd, and re-enables on next boot. +install_unit() { + local ssh_prefix="$1" # e.g. "sshpass -p ... ssh -o ... administrator@host" + local unit="$2" + $ssh_prefix "sudo systemctl stop $unit 2>/dev/null || true" + $ssh_prefix "sudo install -m 644 -o root -g root /tmp/$unit /etc/systemd/system/$unit" + $ssh_prefix "sudo systemctl daemon-reload" + $ssh_prefix "sudo systemctl enable $unit" +} + +# status_block +# prints a short status + last 10 journal lines. +status_block() { + local ssh_prefix="$1" + local unit="$2" + $ssh_prefix "echo ' status:'; sudo systemctl --no-pager --full status $unit | head -3" + $ssh_prefix "echo ' journal (last 10):'; sudo journalctl -u $unit -n 10 --no-pager" +} + echo "==> 92: $HOST_92" $SSH_92 "$HOST_92" "$REMOTE_RESET" $SCP_92 "$REPO_ROOT/bin/agent-micro" "$HOST_92:~/SDP/bin/agent-micro" +$SCP_92 "$REPO_ROOT/systemd/sdp-agent-micro.service" "$HOST_92:/tmp/sdp-agent-micro.service" $SSH_92 "$HOST_92" "chmod +x ~/SDP/bin/agent-micro" -echo " agent-micro copied" +install_unit "$SSH_92 $HOST_92" sdp-agent-micro.service +$SSH_92 "$HOST_92" "sudo systemctl restart sdp-agent-micro" +status_block "$SSH_92 $HOST_92" sdp-agent-micro +echo " agent-micro installed" echo echo "==> 186: $HOST_186" @@ -44,8 +71,20 @@ $SSH_186 "$HOST_186" "$REMOTE_RESET" $SCP_186 "$REPO_ROOT/bin/control-plane" "$HOST_186:~/SDP/bin/control-plane" $SCP_186 "$REPO_ROOT/bin/agent-gateway" "$HOST_186:~/SDP/bin/agent-gateway" $SCP_186 -r "$REPO_ROOT/dashboard/out/." "$HOST_186:~/SDP/dashboard/" +$SCP_186 "$REPO_ROOT/systemd/sdp-control-plane.service" "$HOST_186:/tmp/sdp-control-plane.service" +$SCP_186 "$REPO_ROOT/systemd/sdp-agent-gateway.service" "$HOST_186:/tmp/sdp-agent-gateway.service" $SSH_186 "$HOST_186" "chmod +x ~/SDP/bin/control-plane ~/SDP/bin/agent-gateway" -echo " control-plane, agent-gateway, dashboard copied" + +# Control plane first so the gateway agent's -cp URL has something to dial. +install_unit "$SSH_186 $HOST_186" sdp-control-plane.service +$SSH_186 "$HOST_186" "sudo systemctl restart sdp-control-plane" +status_block "$SSH_186 $HOST_186" sdp-control-plane + +install_unit "$SSH_186 $HOST_186" sdp-agent-gateway.service +$SSH_186 "$HOST_186" "sudo systemctl restart sdp-agent-gateway" +status_block "$SSH_186 $HOST_186" sdp-agent-gateway + +echo " control-plane, agent-gateway, dashboard installed" echo echo "done. (configure nginx by hand on 186; see AGENTS.md for the location block.)" diff --git a/systemd/sdp-agent-gateway.service b/systemd/sdp-agent-gateway.service new file mode 100644 index 0000000..e3d4b0e --- /dev/null +++ b/systemd/sdp-agent-gateway.service @@ -0,0 +1,24 @@ +[Unit] +Description=SDP gateway agent +Documentation=https://github.com/sdp +After=network-online.target docker.service +Wants=network-online.target +Requires=docker.service + +[Service] +Type=simple +User=administrator +WorkingDirectory=/home/administrator +# SDP_CP_URL points at the control plane. Default is the production +# topology (both on 186, dialed over loopback). Override by creating +# /etc/default/sdp-agent-gateway with `SDP_CP_URL=...`. +Environment=SDP_CP_URL=ws://127.0.0.1:3452/ws/agent +EnvironmentFile=-/etc/default/sdp-agent-gateway +ExecStart=/home/administrator/SDP/bin/agent-gateway -node gateway -cp ${SDP_CP_URL} +Restart=always +RestartSec=2s + +MemoryMax=256M + +[Install] +WantedBy=multi-user.target diff --git a/systemd/sdp-agent-micro.service b/systemd/sdp-agent-micro.service new file mode 100644 index 0000000..e31a27b --- /dev/null +++ b/systemd/sdp-agent-micro.service @@ -0,0 +1,24 @@ +[Unit] +Description=SDP micro agent +Documentation=https://github.com/sdp +After=network-online.target docker.service +Wants=network-online.target +Requires=docker.service + +[Service] +Type=simple +User=administrator +WorkingDirectory=/home/administrator +# SDP_CP_URL points at the control plane. Default is the production +# topology (control plane on 186, micro agent on 92). Override by +# creating /etc/default/sdp-agent-micro with `SDP_CP_URL=...`. +Environment=SDP_CP_URL=ws://172.18.139.186:3452/ws/agent +EnvironmentFile=-/etc/default/sdp-agent-micro +ExecStart=/home/administrator/SDP/bin/agent-micro -node micro -cp ${SDP_CP_URL} +Restart=always +RestartSec=2s + +MemoryMax=256M + +[Install] +WantedBy=multi-user.target diff --git a/systemd/sdp-control-plane.service b/systemd/sdp-control-plane.service new file mode 100644 index 0000000..d490786 --- /dev/null +++ b/systemd/sdp-control-plane.service @@ -0,0 +1,23 @@ +[Unit] +Description=SDP control plane +Documentation=https://github.com/sdp +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User=administrator +WorkingDirectory=/home/administrator +ExecStart=/home/administrator/SDP/bin/control-plane -addr :3452 -data /home/administrator/SDP/data +Restart=always +RestartSec=2s + +# Cap memory at 512M. The control plane only holds SQLite + log files; +# anything beyond is a leak. +MemoryMax=512M + +# Allow the process to write to its data dir. +ReadWritePaths=/home/administrator/SDP/data + +[Install] +WantedBy=multi-user.target