diff --git a/DEPLOY.md b/DEPLOY.md new file mode 100644 index 0000000..b74e73b --- /dev/null +++ b/DEPLOY.md @@ -0,0 +1,296 @@ +# SDP — manual deploy + +A copy-pasteable runbook. The principle: anything that runs on a VM is done from inside that VM (just `ssh` in and run it). Anything that pushes files from your laptop to a VM uses `scp` and prompts for the password. + +No `deploy.sh` is involved. No `sshpass`. You type your passwords. + +## 0. Pull the repo on your laptop + +```bash +cd ~/wherever/bri-sandbox-development-platform +git pull origin main +``` + +Confirm the artifacts are present: + +```bash +ls bin/control-plane bin/agent-micro bin/agent-gateway dashboard/out/index.html systemd/sdp-*.service +``` + +## 1. Diagnose sudo on each VM (one time per VM) + +SSH into 92 (you'll be prompted for the password): + +```bash +ssh administrator@172.18.136.92 +``` + +On 92, type: + +```bash +sudo -n true 2>/dev/null && echo "NOPASSWD sudo" || echo "needs password" +sudo echo hi +``` + +- Works without a password prompt → NOPASSWD, skip step 2. +- Prompts and accepts the password you type → SSH password == sudo password, skip step 2. +- Prompts and rejects your password → passwords differ. Note your actual sudo password for step 2. + +Type `exit` to leave 92. Repeat for 186 (`ssh administrator@172.18.139.186`). + +## 2. Set up sudo on each VM (only if step 1 said passwords differ) + +On 92: + +```bash +ssh administrator@172.18.136.92 +``` + +(You'll be prompted for the SSH password. Once in, `sudo tee` will prompt for the sudo password — that's the one you just confirmed.) + +```bash +echo 'administrator ALL=(ALL) NOPASSWD: /bin/systemctl, /usr/bin/install, /usr/bin/journalctl' | sudo tee /etc/sudoers.d/sdp-deploy +sudo chmod 440 /etc/sudoers.d/sdp-deploy +exit +``` + +Repeat for 186 (substitute the 186 IP). After this, `administrator` can run `systemctl`, `install`, and `journalctl` with `sudo` without typing a password. Nothing else is opened up. + +## 3. Kill old SDP processes on each VM (skip on a fresh VM) + +On 92: + +```bash +ssh administrator@172.18.136.92 +pkill -f 'bin/agent-micro' 2>/dev/null; echo done +exit +``` + +On 186: + +```bash +ssh administrator@172.18.139.186 +pkill -f 'bin/control-plane' 2>/dev/null +pkill -f 'bin/agent-gateway' 2>/dev/null +echo done +exit +``` + +## 4. Sanity-check nginx and docker on 186 + +```bash +ssh administrator@172.18.139.186 +sudo nginx -t +sudo systemctl is-active docker +ls -la ~/SDP/dashboard/index.html 2>/dev/null || echo 'dashboard will be created in step 6' +exit +``` + +- `nginx -t` says `syntax is ok` → good. +- `docker` is `active` → good. +- Dashboard missing is fine; step 6 pushes it. + +## 5. Configure nginx on 186 (only on first deploy, or after editing) + +Splice the four `location` blocks from `nginx/sandbox.conf` into `/etc/nginx/sites-available/default` inside the existing `server { }`. Read the file from your laptop first: + +```bash +cat nginx/sandbox.conf +``` + +On 186: + +```bash +ssh administrator@172.18.139.186 +sudo vim /etc/nginx/sites-available/default +# paste the four blocks somewhere inside the server { } +sudo nginx -t +sudo systemctl reload nginx +exit +``` + +## 6. Push the binaries and dashboard to the VMs + +From your laptop. `scp` will prompt for the password. + +**To 92 (micro):** + +```bash +scp bin/agent-micro administrator@172.18.136.92:~/SDP/bin/agent-micro +``` + +**To 186 (gateway):** + +```bash +scp bin/control-plane bin/agent-gateway administrator@172.18.139.186:~/SDP/bin/ +scp -r dashboard/out/. administrator@172.18.139.186:~/SDP/dashboard/ +``` + +**Make binaries executable** (on each VM): + +```bash +ssh administrator@172.18.136.92 "chmod +x ~/SDP/bin/agent-micro" +ssh administrator@172.18.139.186 "chmod +x ~/SDP/bin/control-plane ~/SDP/bin/agent-gateway" +``` + +## 7. Push the systemd unit files + +From your laptop. `scp` will prompt for the password. + +```bash +scp systemd/sdp-agent-micro.service administrator@172.18.136.92:/tmp/sdp-agent-micro.service +scp systemd/sdp-control-plane.service systemd/sdp-agent-gateway.service administrator@172.18.139.186:/tmp/ +``` + +## 8. Install the unit files and start the services + +### 8a. 92 (micro agent only) + +```bash +ssh administrator@172.18.136.92 +sudo install -m 644 -o root -g root /tmp/sdp-agent-micro.service /etc/systemd/system/sdp-agent-micro.service +sudo systemctl daemon-reload +sudo systemctl enable sdp-agent-micro.service +sudo systemctl restart sdp-agent-micro.service +sudo systemctl --no-pager status sdp-agent-micro.service | head -10 +sudo journalctl -u sdp-agent-micro.service -n 10 --no-pager +exit +``` + +Status should be `active (running)`. Journal should show a clean startup, then either a `dial: ws://...` reconnect loop (waiting for the control plane) or `agent-micro connected as micro`. + +### 8b. 186 (control plane FIRST, then gateway agent) + +```bash +ssh administrator@172.18.139.186 +sudo install -m 644 -o root -g root /tmp/sdp-control-plane.service /etc/systemd/system/sdp-control-plane.service +sudo systemctl daemon-reload +sudo systemctl enable sdp-control-plane.service +sudo systemctl restart sdp-control-plane.service +sudo systemctl --no-pager status sdp-control-plane.service | head -10 +sudo journalctl -u sdp-control-plane.service -n 10 --no-pager +``` + +The control plane must be up before the gateway agent starts (or the agent just retries). Wait for `active (running)`, then continue: + +```bash +sudo install -m 644 -o root -g root /tmp/sdp-agent-gateway.service /etc/systemd/system/sdp-agent-gateway.service +sudo systemctl daemon-reload +sudo systemctl enable sdp-agent-gateway.service +sudo systemctl restart sdp-agent-gateway.service +sudo systemctl --no-pager status sdp-agent-gateway.service | head -10 +sudo journalctl -u sdp-agent-gateway.service -n 10 --no-pager +exit +``` + +The journal should show `agent-gateway connected as gateway` after a beat. + +## 9. Browser smoke test (from your laptop) + +Visit: `http://172.18.139.186/sandbox/credit-card/` + +- HTML renders (CSS + JS load) → nginx `try_files` is right. +- Login form submits → `/sandbox/credit-card/api/login` proxies to `:3452`. +- Login with any Bitbucket creds returns 200 → the gateway agent ran `git ls-remote` successfully. +- After login, dashboard renders. Click **Sandboxes** → empty list (SQLite is fresh). + +## 10. Following logs in real time + +On 92 (micro agent): + +```bash +ssh administrator@172.18.136.92 +sudo journalctl -u sdp-agent-micro.service -f +# Ctrl-C to exit +exit +``` + +On 186 (control plane + gateway agent): + +```bash +ssh administrator@172.18.139.186 +sudo journalctl -u sdp-control-plane.service -u sdp-agent-gateway.service -f +# Ctrl-C to exit +exit +``` + +## Common one-time fixes (apply, then re-run from step 8) + +### `${SDP_CP_URL}` doesn't expand in the unit's ExecStart + +Symptom: agent logs `flag: invalid value "${SDP_CP_URL}" for -cp`. + +Fix: hardcode the URL in the unit. On your laptop, edit `systemd/sdp-agent-micro.service`: + +```ini +ExecStart=/home/administrator/SDP/bin/agent-micro -node micro -cp ws://172.18.139.186:3452/ws/agent +``` + +(Remove the `Environment=` / `EnvironmentFile=` / `${SDP_CP_URL}` lines.) Do the same for `systemd/sdp-agent-gateway.service` (URL is `ws://127.0.0.1:3452/ws/agent`). Re-do steps 7 and 8. + +### Micro agent on 92 can't reach the control plane on 186:3452 + +Symptom: `sdp-agent-micro.service` journal shows `dial: ... connection refused` or `i/o timeout` to `172.18.139.186:3452`. + +Fix: add a `/ws/agent` proxy block to 186's nginx (alongside the four from `nginx/sandbox.conf`): + +```nginx +location /ws/agent { + proxy_pass http://127.0.0.1:3452; + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + proxy_set_header Host $host; + proxy_read_timeout 3600s; +} +``` + +On your laptop, edit `systemd/sdp-agent-micro.service` to dial through nginx on 80: + +```ini +Environment=SDP_CP_URL=ws://172.18.139.186/ws/agent +``` + +(Port 80, no `:3452`.) Then on 186, reload nginx and re-do steps 7 and 8a. + +### Login returns "git ls-remote rejected" + +Either: +- The gateway agent isn't connected (re-run step 8b and check the journal). +- Your Bitbucket creds are wrong. +- The api-gateway repo path on 186 is wrong. The agent looks at `/var/www/html/erangel-ocean` by default. On 186: + + ```bash + ls -d /var/www/html/erangel-ocean + ``` + + If the repo is at a different path, edit `agent-gateway/cmd/agent-gateway/main.go`: + + ```go + var repos = map[string]string{ + "api-gateway": "/your/actual/path", + } + ``` + + Then `./scripts/build.sh`, re-do steps 6 and 8b. + +### Service containers can't be created (alpine:3.20 or php:8.3-apache not loaded) + +Symptom: a deploy event stream shows `DEPLOY FAILED` with `image not found`. + +The runtime images must be pre-loaded on the host (the VMs have no internet). On 92: + +```bash +ssh administrator@172.18.136.92 +docker load -i /path/to/alpine-3.20.tar +exit +``` + +On 186: + +```bash +ssh administrator@172.18.139.186 +docker load -i /path/to/php-8.3-apache.tar +docker load -i /path/to/alpine-3.20.tar +exit +```