Commit Graph

7 Commits

Author SHA1 Message Date
Achmad fc768dbd85 DEPLOY.md: stale WS connection after daemon-reexec is the usual RST cause
When 186's control plane comes up via systemctl daemon-reexec
(the 226/NAMESPACE fix), the listening socket is briefly
dropped and re-created. Any WebSocket connection that was
in flight at that moment gets RST. The agent retries every
2s, but if the dial happened exactly during the reexec
window the agent can sit in a tight RST loop until restarted.

Document the restart-agent-micro as the first thing to try,
and demote the iptables/fail2ban/curl diagnostic steps to
'if the agent still RSTs after restart'.
2026-06-24 05:40:42 +00:00
Achmad 8c598ad69f DEPLOY.md: troubleshooting for agent-micro WS RST
'connection reset by peer' on the WS dial is at the TCP layer,
not the application layer. Almost always a firewall on 186
iptables REJECT, fail2ban ban, or stale conntrack state.

Document the diagnostic ladder: iptables -L INPUT, fail2ban
status, then a plain HTTP curl from 92 to verify the network
path, then a WS upgrade curl on 186 itself to verify the
control plane's upgrader.
2026-06-24 05:36:18 +00:00
Achmad 0569cede43 DEPLOY.md: pre-create ~/SDP/data on 186; create it in step 6a too
The control plane binary will create the data dir on first
start, but doing it before systemd starts the service means
the ReadWritePaths scope has somewhere to point at, and
faster diagnosis if anything else is wrong.
2026-06-24 05:31:51 +00:00
Achmad 92354252e5 DEPLOY.md: troubleshooting for status=226/NAMESPACE failure
The 226/NAMESPACE with 'Failed to set up mount namespacing' error
is misleading — the binary is fine, but systemd can't build the
mount namespace for the service. The binary runs fine when
launched directly as administrator; the bug is in the systemd
manager's runtime state.

Document the diagnostic (run the binary manually) and the two
fixes: systemctl daemon-reexec (recreate /run/systemd) as the
first attempt, reboot as the last resort.
2026-06-24 05:27:59 +00:00
Achmad 1f1ff2f173 DEPLOY.md: drop sudo discussion entirely
The user has made it clear (twice now) that they don't want
sudo advice in the runbook — they can type the password
themselves and don't want a script or sudoers change.

Delete the 'Diagnose sudo' step and the 'Sudo on the company
VMs' reminder step. Sudo is just expected behavior; when
the user runs 'sudo systemctl ...' and gets a prompt, they
type the password. No commentary needed.

Renumber the remaining steps so they're sequential 0-8.
2026-06-24 05:19:25 +00:00
Achmad d11723ee63 DEPLOY.md: drop NOPASSWD advice, document interactive sudo
Company VM — no sudoers changes. Replace the 'set up sudoers
NOPASSWD' step with a brief note that every sudo call will
prompt for the password and the user types it. The 15-minute
sudo timestamp means the user only types it once per shell
session, but they will see the prompt several times across
the deploy as they run multiple sudo commands.

Update the step-1 diagnostic outcomes to point at the new
no-policy-change reality: NOPASSWD or different passwords
both still work, the user just types the right one at each
sudo prompt.
2026-06-24 05:16:52 +00:00
Achmad 1eddef9f65 Add DEPLOY.md: copy-pasteable runbook, no deploy.sh
A hand-typed manual deploy guide. Every step is a single
ssh or scp from the laptop, or a one-shot block of commands
inside the VM. No sshpass, no env-var passwords, no
sudo -S password piping. The user types their passwords
interactively when prompted.

The old deploy.sh had grown into a tangle of -tt / sudo -S
/ PAGER=cat workarounds that hid what was actually happening
and was fragile across systemd versions. The runbook trades
that off for explicit per-step commands that the user can
verify by reading the output.

Troubleshooting section at the bottom covers the four most
likely first-deploy failures: SDP_CP_URL expansion, micro
agent can't reach the control plane, login auth rejection,
and missing runtime images.
2026-06-24 05:14:06 +00:00