diff --git a/DEPLOY.md b/DEPLOY.md index 994b0af..2eca411 100644 --- a/DEPLOY.md +++ b/DEPLOY.md @@ -288,3 +288,61 @@ exit ``` If still failing, the systemd manager itself is in a bad state. Reboot the VM (last resort; will interrupt any other work on it). + +### Agent-micro on 92 gets `connection reset by peer` connecting to 186:3452 + +`connection reset by peer` is at the TCP layer — the SYN reaches the host kernel but something RSTs the connection before the control plane sees it. Common causes: + +1. **iptables on 186 has a REJECT rule for 3452.** Check: + + ```bash + ssh administrator@172.18.139.186 + sudo iptables -L INPUT -n | head -30 + exit + ``` + + If you see a REJECT rule for port 3452, drop or modify it. The control plane is on the same host, so there's no reason to filter loopback or local-subnet traffic to it. + +2. **fail2ban has banned the agent's IP.** Check: + + ```bash + ssh administrator@172.18.139.186 + sudo fail2ban-client status + sudo fail2ban-client status sshd 2>/dev/null + exit + ``` + + If 92's IP is in the banned list, add the SDP subnet to `ignoreip` in `/etc/fail2ban/jail.local` and `sudo fail2ban-client reload`. + +3. **The kernel's connection tracking has stale state.** Restart it (last resort): + + ```bash + ssh administrator@172.18.139.186 + sudo systemctl restart nftables 2>/dev/null + sudo systemctl restart firewalld 2>/dev/null + exit + ``` + +4. **Verify the network path works at all** before debugging firewall rules. From 92, a plain HTTP request to the control plane's port: + + ```bash + ssh administrator@172.18.136.92 + curl -v http://172.18.139.186:3452/ + exit + ``` + + If you get *any* HTTP response (even a Go HTTP 400 for "missing node query") → the path is open and the problem is the WebSocket upgrade. If you get `Connection reset by peer` again → the path is being blocked, look at the iptables/fail2ban angle. + +5. **Verify the WS endpoint works on 186 itself** (rules out the network and confirms the upgrade logic): + + ```bash + ssh administrator@172.18.139.186 + curl -i \ + -H "Connection: Upgrade" -H "Upgrade: websocket" \ + -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ + -H "Sec-WebSocket-Version: 13" \ + "http://127.0.0.1:3452/ws/agent?node=micro" + exit + ``` + + Should return HTTP 101 Switching Protocols. If it does, the network from 92 is the issue. If it doesn't, the control plane binary has a problem.