0569cede43
The control plane binary will create the data dir on first start, but doing it before systemd starts the service means the ReadWritePaths scope has somewhere to point at, and faster diagnosis if anything else is wrong.
291 lines
8.8 KiB
Markdown
291 lines
8.8 KiB
Markdown
# SDP — manual deploy
|
|
|
|
A copy-pasteable runbook. The principle: anything that runs on a VM is done from inside that VM (just `ssh` in and run it). Anything that pushes files from your laptop to a VM uses `scp` and prompts for the password.
|
|
|
|
No `deploy.sh` is involved. No `sshpass`. You type your passwords.
|
|
|
|
## 0. Pull the repo on your laptop
|
|
|
|
```bash
|
|
cd ~/wherever/bri-sandbox-development-platform
|
|
git pull origin main
|
|
```
|
|
|
|
Confirm the artifacts are present:
|
|
|
|
```bash
|
|
ls bin/control-plane bin/agent-micro bin/agent-gateway dashboard/out/index.html systemd/sdp-*.service
|
|
```
|
|
|
|
## 1. Kill old SDP processes on each VM (skip on a fresh VM)
|
|
|
|
On 92:
|
|
|
|
```bash
|
|
ssh administrator@172.18.136.92
|
|
pkill -f 'bin/agent-micro' 2>/dev/null; echo done
|
|
exit
|
|
```
|
|
|
|
On 186:
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
pkill -f 'bin/control-plane' 2>/dev/null
|
|
pkill -f 'bin/agent-gateway' 2>/dev/null
|
|
echo done
|
|
exit
|
|
```
|
|
|
|
## 2. Sanity-check nginx and docker on 186
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
sudo nginx -t
|
|
sudo systemctl is-active docker
|
|
ls -la ~/SDP/dashboard/index.html 2>/dev/null || echo 'dashboard will be created in step 6'
|
|
exit
|
|
```
|
|
|
|
- `nginx -t` says `syntax is ok` → good.
|
|
- `docker` is `active` → good.
|
|
- Dashboard missing is fine; step 6 pushes it.
|
|
|
|
## 3. Configure nginx on 186 (only on first deploy, or after editing)
|
|
|
|
Splice the four `location` blocks from `nginx/sandbox.conf` into `/etc/nginx/sites-available/default` inside the existing `server { }`. Read the file from your laptop first:
|
|
|
|
```bash
|
|
cat nginx/sandbox.conf
|
|
```
|
|
|
|
On 186:
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
sudo vim /etc/nginx/sites-available/default
|
|
# paste the four blocks somewhere inside the server { }
|
|
sudo nginx -t
|
|
sudo systemctl reload nginx
|
|
exit
|
|
```
|
|
|
|
## 4. Push the binaries and dashboard to the VMs
|
|
|
|
From your laptop. `scp` will prompt for the password.
|
|
|
|
**To 92 (micro):**
|
|
|
|
```bash
|
|
scp bin/agent-micro administrator@172.18.136.92:~/SDP/bin/agent-micro
|
|
```
|
|
|
|
**To 186 (gateway):**
|
|
|
|
```bash
|
|
scp bin/control-plane bin/agent-gateway administrator@172.18.139.186:~/SDP/bin/
|
|
scp -r dashboard/out/. administrator@172.18.139.186:~/SDP/dashboard/
|
|
```
|
|
|
|
**Make binaries executable** (on each VM):
|
|
|
|
```bash
|
|
ssh administrator@172.18.136.92 "chmod +x ~/SDP/bin/agent-micro"
|
|
ssh administrator@172.18.139.186 "chmod +x ~/SDP/bin/control-plane ~/SDP/bin/agent-gateway"
|
|
```
|
|
|
|
**Pre-create the control plane's data dir on 186** (SQLite + log files live here):
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186 "mkdir -p ~/SDP/data && ls -ld ~/SDP/data"
|
|
```
|
|
|
|
Should print `drwxr-xr-x ... administrator administrator ... /home/administrator/SDP/data`. The control plane binary creates it on first run too, but doing it now means the systemd unit's `ReadWritePaths` check has somewhere to point at.
|
|
|
|
## 5. Push the systemd unit files
|
|
|
|
From your laptop. `scp` will prompt for the password.
|
|
|
|
```bash
|
|
scp systemd/sdp-agent-micro.service administrator@172.18.136.92:/tmp/sdp-agent-micro.service
|
|
scp systemd/sdp-control-plane.service systemd/sdp-agent-gateway.service administrator@172.18.139.186:/tmp/
|
|
```
|
|
|
|
## 6. Install the unit files and start the services
|
|
|
|
### 8a. 92 (micro agent only)
|
|
|
|
```bash
|
|
ssh administrator@172.18.136.92
|
|
sudo install -m 644 -o root -g root /tmp/sdp-agent-micro.service /etc/systemd/system/sdp-agent-micro.service
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable sdp-agent-micro.service
|
|
sudo systemctl restart sdp-agent-micro.service
|
|
sudo systemctl --no-pager status sdp-agent-micro.service | head -10
|
|
sudo journalctl -u sdp-agent-micro.service -n 10 --no-pager
|
|
exit
|
|
```
|
|
|
|
Status should be `active (running)`. Journal should show a clean startup, then either a `dial: ws://...` reconnect loop (waiting for the control plane) or `agent-micro connected as micro`.
|
|
|
|
### 8b. 186 (control plane FIRST, then gateway agent)
|
|
|
|
```bash
|
|
sudo install -m 644 -o root -g root /tmp/sdp-control-plane.service /etc/systemd/system/sdp-control-plane.service
|
|
sudo mkdir -p /home/administrator/SDP/data
|
|
sudo chown administrator:administrator /home/administrator/SDP/data
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable sdp-control-plane.service
|
|
sudo systemctl restart sdp-control-plane.service
|
|
sudo systemctl --no-pager status sdp-control-plane.service | head -10
|
|
sudo journalctl -u sdp-control-plane.service -n 10 --no-pager
|
|
```
|
|
|
|
The control plane must be up before the gateway agent starts (or the agent just retries). Wait for `active (running)`, then continue:
|
|
|
|
```bash
|
|
sudo install -m 644 -o root -g root /tmp/sdp-agent-gateway.service /etc/systemd/system/sdp-agent-gateway.service
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable sdp-agent-gateway.service
|
|
sudo systemctl restart sdp-agent-gateway.service
|
|
sudo systemctl --no-pager status sdp-agent-gateway.service | head -10
|
|
sudo journalctl -u sdp-agent-gateway.service -n 10 --no-pager
|
|
exit
|
|
```
|
|
|
|
The journal should show `agent-gateway connected as gateway` after a beat.
|
|
|
|
## 7. Browser smoke test (from your laptop)
|
|
|
|
Visit: `http://172.18.139.186/sandbox/credit-card/`
|
|
|
|
- HTML renders (CSS + JS load) → nginx `try_files` is right.
|
|
- Login form submits → `/sandbox/credit-card/api/login` proxies to `:3452`.
|
|
- Login with any Bitbucket creds returns 200 → the gateway agent ran `git ls-remote` successfully.
|
|
- After login, dashboard renders. Click **Sandboxes** → empty list (SQLite is fresh).
|
|
|
|
## 8. Following logs in real time
|
|
|
|
On 92 (micro agent):
|
|
|
|
```bash
|
|
ssh administrator@172.18.136.92
|
|
sudo journalctl -u sdp-agent-micro.service -f
|
|
# Ctrl-C to exit
|
|
exit
|
|
```
|
|
|
|
On 186 (control plane + gateway agent):
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
sudo journalctl -u sdp-control-plane.service -u sdp-agent-gateway.service -f
|
|
# Ctrl-C to exit
|
|
exit
|
|
```
|
|
|
|
## Common one-time fixes (apply, then re-run from step 6)
|
|
|
|
### `${SDP_CP_URL}` doesn't expand in the unit's ExecStart
|
|
|
|
Symptom: agent logs `flag: invalid value "${SDP_CP_URL}" for -cp`.
|
|
|
|
Fix: hardcode the URL in the unit. On your laptop, edit `systemd/sdp-agent-micro.service`:
|
|
|
|
```ini
|
|
ExecStart=/home/administrator/SDP/bin/agent-micro -node micro -cp ws://172.18.139.186:3452/ws/agent
|
|
```
|
|
|
|
(Remove the `Environment=` / `EnvironmentFile=` / `${SDP_CP_URL}` lines.) Do the same for `systemd/sdp-agent-gateway.service` (URL is `ws://127.0.0.1:3452/ws/agent`). Re-do steps 7 and 8.
|
|
|
|
### Micro agent on 92 can't reach the control plane on 186:3452
|
|
|
|
Symptom: `sdp-agent-micro.service` journal shows `dial: ... connection refused` or `i/o timeout` to `172.18.139.186:3452`.
|
|
|
|
Fix: add a `/ws/agent` proxy block to 186's nginx (alongside the four from `nginx/sandbox.conf`):
|
|
|
|
```nginx
|
|
location /ws/agent {
|
|
proxy_pass http://127.0.0.1:3452;
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Upgrade $http_upgrade;
|
|
proxy_set_header Connection "upgrade";
|
|
proxy_set_header Host $host;
|
|
proxy_read_timeout 3600s;
|
|
}
|
|
```
|
|
|
|
On your laptop, edit `systemd/sdp-agent-micro.service` to dial through nginx on 80:
|
|
|
|
```ini
|
|
Environment=SDP_CP_URL=ws://172.18.139.186/ws/agent
|
|
```
|
|
|
|
(Port 80, no `:3452`.) Then on 186, reload nginx and re-do steps 7 and 8a.
|
|
|
|
### Login returns "git ls-remote rejected"
|
|
|
|
Either:
|
|
- The gateway agent isn't connected (re-run step 6b and check the journal).
|
|
- Your Bitbucket creds are wrong.
|
|
- The api-gateway repo path on 186 is wrong. The agent looks at `/var/www/html/erangel-ocean` by default. On 186:
|
|
|
|
```bash
|
|
ls -d /var/www/html/erangel-ocean
|
|
```
|
|
|
|
If the repo is at a different path, edit `agent-gateway/cmd/agent-gateway/main.go`:
|
|
|
|
```go
|
|
var repos = map[string]string{
|
|
"api-gateway": "/your/actual/path",
|
|
}
|
|
```
|
|
|
|
Then `./scripts/build.sh`, re-do steps 6 and 8b.
|
|
|
|
### Service containers can't be created (alpine:3.20 or php:8.3-apache not loaded)
|
|
|
|
Symptom: a deploy event stream shows `DEPLOY FAILED` with `image not found`.
|
|
|
|
The runtime images must be pre-loaded on the host (the VMs have no internet). On 92:
|
|
|
|
```bash
|
|
ssh administrator@172.18.136.92
|
|
docker load -i /path/to/alpine-3.20.tar
|
|
exit
|
|
```
|
|
|
|
On 186:
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
docker load -i /path/to/php-8.3-apache.tar
|
|
docker load -i /path/to/alpine-3.20.tar
|
|
exit
|
|
```
|
|
|
|
### Service fails with `status=226/NAMESPACE` and `Failed to set up mount namespacing: No such file or directory`
|
|
|
|
Your binary is fine; systemd's service-execution environment is broken. Diagnose by running the binary manually as `administrator`:
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
./SDP/bin/control-plane -addr :3452 -data ./SDP/data
|
|
# Should print "control-plane listening on :3452 (data=./SDP/data)"
|
|
# Ctrl-C to exit
|
|
exit
|
|
```
|
|
|
|
If that works, the binary is fine. systemd's namespace setup is failing — common cause on this Ubuntu: `/run/systemd` is missing. Force it to be recreated:
|
|
|
|
```bash
|
|
ssh administrator@172.18.139.186
|
|
sudo systemctl daemon-reexec
|
|
sudo systemctl restart sdp-control-plane.service
|
|
sudo systemctl --no-pager status sdp-control-plane.service | head -10
|
|
exit
|
|
```
|
|
|
|
If still failing, the systemd manager itself is in a bad state. Reboot the VM (last resort; will interrupt any other work on it).
|