Files
bri-sandbox-development-pla…/DEPLOY.md
T
Achmad 92354252e5 DEPLOY.md: troubleshooting for status=226/NAMESPACE failure
The 226/NAMESPACE with 'Failed to set up mount namespacing' error
is misleading — the binary is fine, but systemd can't build the
mount namespace for the service. The binary runs fine when
launched directly as administrator; the bug is in the systemd
manager's runtime state.

Document the diagnostic (run the binary manually) and the two
fixes: systemctl daemon-reexec (recreate /run/systemd) as the
first attempt, reboot as the last resort.
2026-06-24 05:27:59 +00:00

8.3 KiB

SDP — manual deploy

A copy-pasteable runbook. The principle: anything that runs on a VM is done from inside that VM (just ssh in and run it). Anything that pushes files from your laptop to a VM uses scp and prompts for the password.

No deploy.sh is involved. No sshpass. You type your passwords.

0. Pull the repo on your laptop

cd ~/wherever/bri-sandbox-development-platform
git pull origin main

Confirm the artifacts are present:

ls bin/control-plane bin/agent-micro bin/agent-gateway dashboard/out/index.html systemd/sdp-*.service

1. Kill old SDP processes on each VM (skip on a fresh VM)

On 92:

ssh administrator@172.18.136.92
pkill -f 'bin/agent-micro' 2>/dev/null; echo done
exit

On 186:

ssh administrator@172.18.139.186
pkill -f 'bin/control-plane' 2>/dev/null
pkill -f 'bin/agent-gateway' 2>/dev/null
echo done
exit

2. Sanity-check nginx and docker on 186

ssh administrator@172.18.139.186
sudo nginx -t
sudo systemctl is-active docker
ls -la ~/SDP/dashboard/index.html 2>/dev/null || echo 'dashboard will be created in step 6'
exit
  • nginx -t says syntax is ok → good.
  • docker is active → good.
  • Dashboard missing is fine; step 6 pushes it.

3. Configure nginx on 186 (only on first deploy, or after editing)

Splice the four location blocks from nginx/sandbox.conf into /etc/nginx/sites-available/default inside the existing server { }. Read the file from your laptop first:

cat nginx/sandbox.conf

On 186:

ssh administrator@172.18.139.186
sudo vim /etc/nginx/sites-available/default
# paste the four blocks somewhere inside the server { }
sudo nginx -t
sudo systemctl reload nginx
exit

4. Push the binaries and dashboard to the VMs

From your laptop. scp will prompt for the password.

To 92 (micro):

scp bin/agent-micro administrator@172.18.136.92:~/SDP/bin/agent-micro

To 186 (gateway):

scp bin/control-plane bin/agent-gateway administrator@172.18.139.186:~/SDP/bin/
scp -r dashboard/out/. administrator@172.18.139.186:~/SDP/dashboard/

Make binaries executable (on each VM):

ssh administrator@172.18.136.92 "chmod +x ~/SDP/bin/agent-micro"
ssh administrator@172.18.139.186 "chmod +x ~/SDP/bin/control-plane ~/SDP/bin/agent-gateway"

5. Push the systemd unit files

From your laptop. scp will prompt for the password.

scp systemd/sdp-agent-micro.service administrator@172.18.136.92:/tmp/sdp-agent-micro.service
scp systemd/sdp-control-plane.service systemd/sdp-agent-gateway.service administrator@172.18.139.186:/tmp/

6. Install the unit files and start the services

8a. 92 (micro agent only)

ssh administrator@172.18.136.92
sudo install -m 644 -o root -g root /tmp/sdp-agent-micro.service /etc/systemd/system/sdp-agent-micro.service
sudo systemctl daemon-reload
sudo systemctl enable sdp-agent-micro.service
sudo systemctl restart sdp-agent-micro.service
sudo systemctl --no-pager status sdp-agent-micro.service | head -10
sudo journalctl -u sdp-agent-micro.service -n 10 --no-pager
exit

Status should be active (running). Journal should show a clean startup, then either a dial: ws://... reconnect loop (waiting for the control plane) or agent-micro connected as micro.

8b. 186 (control plane FIRST, then gateway agent)

ssh administrator@172.18.139.186
sudo install -m 644 -o root -g root /tmp/sdp-control-plane.service /etc/systemd/system/sdp-control-plane.service
sudo systemctl daemon-reload
sudo systemctl enable sdp-control-plane.service
sudo systemctl restart sdp-control-plane.service
sudo systemctl --no-pager status sdp-control-plane.service | head -10
sudo journalctl -u sdp-control-plane.service -n 10 --no-pager

The control plane must be up before the gateway agent starts (or the agent just retries). Wait for active (running), then continue:

sudo install -m 644 -o root -g root /tmp/sdp-agent-gateway.service /etc/systemd/system/sdp-agent-gateway.service
sudo systemctl daemon-reload
sudo systemctl enable sdp-agent-gateway.service
sudo systemctl restart sdp-agent-gateway.service
sudo systemctl --no-pager status sdp-agent-gateway.service | head -10
sudo journalctl -u sdp-agent-gateway.service -n 10 --no-pager
exit

The journal should show agent-gateway connected as gateway after a beat.

7. Browser smoke test (from your laptop)

Visit: http://172.18.139.186/sandbox/credit-card/

  • HTML renders (CSS + JS load) → nginx try_files is right.
  • Login form submits → /sandbox/credit-card/api/login proxies to :3452.
  • Login with any Bitbucket creds returns 200 → the gateway agent ran git ls-remote successfully.
  • After login, dashboard renders. Click Sandboxes → empty list (SQLite is fresh).

8. Following logs in real time

On 92 (micro agent):

ssh administrator@172.18.136.92
sudo journalctl -u sdp-agent-micro.service -f
# Ctrl-C to exit
exit

On 186 (control plane + gateway agent):

ssh administrator@172.18.139.186
sudo journalctl -u sdp-control-plane.service -u sdp-agent-gateway.service -f
# Ctrl-C to exit
exit

Common one-time fixes (apply, then re-run from step 6)

${SDP_CP_URL} doesn't expand in the unit's ExecStart

Symptom: agent logs flag: invalid value "${SDP_CP_URL}" for -cp.

Fix: hardcode the URL in the unit. On your laptop, edit systemd/sdp-agent-micro.service:

ExecStart=/home/administrator/SDP/bin/agent-micro -node micro -cp ws://172.18.139.186:3452/ws/agent

(Remove the Environment= / EnvironmentFile= / ${SDP_CP_URL} lines.) Do the same for systemd/sdp-agent-gateway.service (URL is ws://127.0.0.1:3452/ws/agent). Re-do steps 7 and 8.

Micro agent on 92 can't reach the control plane on 186:3452

Symptom: sdp-agent-micro.service journal shows dial: ... connection refused or i/o timeout to 172.18.139.186:3452.

Fix: add a /ws/agent proxy block to 186's nginx (alongside the four from nginx/sandbox.conf):

location /ws/agent {
    proxy_pass http://127.0.0.1:3452;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_read_timeout 3600s;
}

On your laptop, edit systemd/sdp-agent-micro.service to dial through nginx on 80:

Environment=SDP_CP_URL=ws://172.18.139.186/ws/agent

(Port 80, no :3452.) Then on 186, reload nginx and re-do steps 7 and 8a.

Login returns "git ls-remote rejected"

Either:

  • The gateway agent isn't connected (re-run step 6b and check the journal).

  • Your Bitbucket creds are wrong.

  • The api-gateway repo path on 186 is wrong. The agent looks at /var/www/html/erangel-ocean by default. On 186:

    ls -d /var/www/html/erangel-ocean
    

    If the repo is at a different path, edit agent-gateway/cmd/agent-gateway/main.go:

    var repos = map[string]string{
        "api-gateway": "/your/actual/path",
    }
    

    Then ./scripts/build.sh, re-do steps 6 and 8b.

Service containers can't be created (alpine:3.20 or php:8.3-apache not loaded)

Symptom: a deploy event stream shows DEPLOY FAILED with image not found.

The runtime images must be pre-loaded on the host (the VMs have no internet). On 92:

ssh administrator@172.18.136.92
docker load -i /path/to/alpine-3.20.tar
exit

On 186:

ssh administrator@172.18.139.186
docker load -i /path/to/php-8.3-apache.tar
docker load -i /path/to/alpine-3.20.tar
exit

Service fails with status=226/NAMESPACE and Failed to set up mount namespacing: No such file or directory

Your binary is fine; systemd's service-execution environment is broken. Diagnose by running the binary manually as administrator:

ssh administrator@172.18.139.186
./SDP/bin/control-plane -addr :3452 -data ./SDP/data
# Should print "control-plane listening on :3452 (data=./SDP/data)"
# Ctrl-C to exit
exit

If that works, the binary is fine. systemd's namespace setup is failing — common cause on this Ubuntu: /run/systemd is missing. Force it to be recreated:

ssh administrator@172.18.139.186
sudo systemctl daemon-reexec
sudo systemctl restart sdp-control-plane.service
sudo systemctl --no-pager status sdp-control-plane.service | head -10
exit

If still failing, the systemd manager itself is in a bad state. Reboot the VM (last resort; will interrupt any other work on it).