bri-sandbox-development-platform

Author	SHA1	Message	Date
Achmad	d5d5e5467d	DEPLOY.md: two more WS-upgrade tests for the curl-works-but-RST case When plain HTTP from 92 reaches the control plane but the WebSocket dial RSTs, test the upgrader on each side: 1. curl from 186 to 127.0.0.1:3452 with WS upgrade headers: - 101 → control plane is fine, network is the issue. - RST/4xx → control plane is broken. 2. curl from 92 to 186:3452 with WS upgrade headers: - 101 → firewall allows WS traffic, agent's client is the issue. - RST → some middlebox matches on the Upgrade header. - 4xx → control plane rejects the upgrade.	2026-06-24 05:52:04 +00:00
Achmad	f3da975eb7	DEPLOY.md: remove the 'agent on 92 can't reach 186:3452' section The user's micro agent dials ws://172.18.139.186:3452/ws/agent directly, bypassing nginx. The 'add a /ws/agent nginx proxy block' workaround was for a different topology where the agent goes through nginx on 80, which doesn't apply here. That section was confusing — it suggested an nginx change the user doesn't need. Delete it.	2026-06-24 05:48:10 +00:00
Achmad	b8736f4ac3	DEPLOY.md: also check what's bound to port 3452 when curl works If something else is also listening on :3452 (a leftover container, a systemd-managed socket, a proxy from an earlier session), the kernel can route new connects to it and that listener may RST. Curl from outside gets a clean response from the real listener; the agent's WS dial lands on the stale one and gets RST. Add ss -tlnp + lsof + systemctl list-sockets to the diagnostic ladder.	2026-06-24 05:43:53 +00:00
Achmad	fc768dbd85	DEPLOY.md: stale WS connection after daemon-reexec is the usual RST cause When 186's control plane comes up via systemctl daemon-reexec (the 226/NAMESPACE fix), the listening socket is briefly dropped and re-created. Any WebSocket connection that was in flight at that moment gets RST. The agent retries every 2s, but if the dial happened exactly during the reexec window the agent can sit in a tight RST loop until restarted. Document the restart-agent-micro as the first thing to try, and demote the iptables/fail2ban/curl diagnostic steps to 'if the agent still RSTs after restart'.	2026-06-24 05:40:42 +00:00
Achmad	8c598ad69f	DEPLOY.md: troubleshooting for agent-micro WS RST 'connection reset by peer' on the WS dial is at the TCP layer, not the application layer. Almost always a firewall on 186 iptables REJECT, fail2ban ban, or stale conntrack state. Document the diagnostic ladder: iptables -L INPUT, fail2ban status, then a plain HTTP curl from 92 to verify the network path, then a WS upgrade curl on 186 itself to verify the control plane's upgrader.	2026-06-24 05:36:18 +00:00
Achmad	0569cede43	DEPLOY.md: pre-create ~/SDP/data on 186; create it in step 6a too The control plane binary will create the data dir on first start, but doing it before systemd starts the service means the ReadWritePaths scope has somewhere to point at, and faster diagnosis if anything else is wrong.	2026-06-24 05:31:51 +00:00
Achmad	92354252e5	DEPLOY.md: troubleshooting for status=226/NAMESPACE failure The 226/NAMESPACE with 'Failed to set up mount namespacing' error is misleading — the binary is fine, but systemd can't build the mount namespace for the service. The binary runs fine when launched directly as administrator; the bug is in the systemd manager's runtime state. Document the diagnostic (run the binary manually) and the two fixes: systemctl daemon-reexec (recreate /run/systemd) as the first attempt, reboot as the last resort.	2026-06-24 05:27:59 +00:00
Achmad	1f1ff2f173	DEPLOY.md: drop sudo discussion entirely The user has made it clear (twice now) that they don't want sudo advice in the runbook — they can type the password themselves and don't want a script or sudoers change. Delete the 'Diagnose sudo' step and the 'Sudo on the company VMs' reminder step. Sudo is just expected behavior; when the user runs 'sudo systemctl ...' and gets a prompt, they type the password. No commentary needed. Renumber the remaining steps so they're sequential 0-8.	2026-06-24 05:19:25 +00:00
Achmad	d11723ee63	DEPLOY.md: drop NOPASSWD advice, document interactive sudo Company VM — no sudoers changes. Replace the 'set up sudoers NOPASSWD' step with a brief note that every sudo call will prompt for the password and the user types it. The 15-minute sudo timestamp means the user only types it once per shell session, but they will see the prompt several times across the deploy as they run multiple sudo commands. Update the step-1 diagnostic outcomes to point at the new no-policy-change reality: NOPASSWD or different passwords both still work, the user just types the right one at each sudo prompt.	2026-06-24 05:16:52 +00:00
Achmad	1eddef9f65	Add DEPLOY.md: copy-pasteable runbook, no deploy.sh A hand-typed manual deploy guide. Every step is a single ssh or scp from the laptop, or a one-shot block of commands inside the VM. No sshpass, no env-var passwords, no sudo -S password piping. The user types their passwords interactively when prompted. The old deploy.sh had grown into a tangle of -tt / sudo -S / PAGER=cat workarounds that hid what was actually happening and was fragile across systemd versions. The runbook trades that off for explicit per-step commands that the user can verify by reading the output. Troubleshooting section at the bottom covers the four most likely first-deploy failures: SDP_CP_URL expansion, micro agent can't reach the control plane, login auth rejection, and missing runtime images.	2026-06-24 05:14:06 +00:00

10 Commits