diff --git a/DEPLOY.md b/DEPLOY.md index 455cf9d..fd4fd73 100644 --- a/DEPLOY.md +++ b/DEPLOY.md @@ -337,3 +337,57 @@ exit ``` You should see only one listener: `control-plane` PID, IPv6 `*:3452` (dual-stack). If you see anything else — another systemd socket, a leftover container, a proxy — kill it. Then `sudo systemctl restart sdp-control-plane.service` on 186 and try the agent again. + +### One more thing to try: IPv4 vs IPv6 + +Go's dual-stack listen (`:3452`) registers an IPv6 socket that also accepts IPv4 via IPv4-mapped addresses. Some networks route IPv4 and IPv6 differently, and a corporate firewall might allow one but not the other. From 92: + +```bash +ssh administrator@172.18.136.92 +getent ahosts 172.18.139.186 +curl -4 -v http://172.18.139.186:3452/ 2>&1 | head -10 +curl -6 -v http://172.18.139.186:3452/ 2>&1 | head -10 +exit +``` + +If `-4` works and `-6` RSTs, or vice versa, the network is treating them differently. Fix by either: + +- Pin the control plane to IPv4 only: edit `sdp-control-plane.service` `ExecStart` to use `-addr 0.0.0.0:3452` instead of `-addr :3452`. The Go `net.Listen` interprets `:3452` as `[::]:3452` (IPv6 dual-stack) and `0.0.0.0:3452` as IPv4-only. +- Or pin the agent-micro to IPv4: `Environment=SDP_CP_URL=ws://172.18.139.186:3452/ws/agent` already uses the IPv4 literal, so this should "just work" — but if the kernel still tries IPv6 first, set `GODEBUG=netdns=go+1` or just use the literal IPv4 address in the URL. + +If both `-4` and `-6` work, the network is fine. Re-run the agent-micro restart and re-check the journal. + +### Curl from 92 works but the agent still RSTs — test the WebSocket upgrade + +The control plane is up, plain HTTP from 92 reaches it, but the WebSocket dial RSTs. The TCP connection succeeds (so it's not a firewall on the port) but the server-side read of the request triggers a RST. Two tests, one from each side: + +**1. WS upgrade on 186 itself (rules out the control plane binary):** + +```bash +ssh administrator@172.18.139.186 +curl -i \ + -H "Connection: Upgrade" -H "Upgrade: websocket" \ + -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ + -H "Sec-WebSocket-Version: 13" \ + "http://127.0.0.1:3452/ws/agent?node=micro" +exit +``` + +- 101 Switching Protocols → control plane is fine. Network between 92 and 186 is the issue. +- RST or 4xx → control plane is broken. Check `journalctl -u sdp-control-plane.service` for errors after the listen line. + +**2. WS upgrade from 92 (rules out a header-aware firewall):** + +```bash +ssh administrator@172.18.136.92 +curl -i \ + -H "Connection: Upgrade" -H "Upgrade: websocket" \ + -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ + -H "Sec-WebSocket-Version: 13" \ + "http://172.18.139.186:3452/ws/agent?node=micro" +exit +``` + +- 101 → firewall allows WS-shaped traffic. The agent's client is the issue (unlikely; same gorilla/websocket). +- RST → some middlebox (iptables, corporate firewall, fail2ban) is matching on `Upgrade: websocket` and RSTing. Find the rule. +- 4xx → control plane reachable but rejecting the upgrade.