DEPLOY.md: troubleshooting for agent-micro WS RST
'connection reset by peer' on the WS dial is at the TCP layer, not the application layer. Almost always a firewall on 186 iptables REJECT, fail2ban ban, or stale conntrack state. Document the diagnostic ladder: iptables -L INPUT, fail2ban status, then a plain HTTP curl from 92 to verify the network path, then a WS upgrade curl on 186 itself to verify the control plane's upgrader.
This commit is contained in:
@@ -288,3 +288,61 @@ exit
|
|||||||
```
|
```
|
||||||
|
|
||||||
If still failing, the systemd manager itself is in a bad state. Reboot the VM (last resort; will interrupt any other work on it).
|
If still failing, the systemd manager itself is in a bad state. Reboot the VM (last resort; will interrupt any other work on it).
|
||||||
|
|
||||||
|
### Agent-micro on 92 gets `connection reset by peer` connecting to 186:3452
|
||||||
|
|
||||||
|
`connection reset by peer` is at the TCP layer — the SYN reaches the host kernel but something RSTs the connection before the control plane sees it. Common causes:
|
||||||
|
|
||||||
|
1. **iptables on 186 has a REJECT rule for 3452.** Check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh administrator@172.18.139.186
|
||||||
|
sudo iptables -L INPUT -n | head -30
|
||||||
|
exit
|
||||||
|
```
|
||||||
|
|
||||||
|
If you see a REJECT rule for port 3452, drop or modify it. The control plane is on the same host, so there's no reason to filter loopback or local-subnet traffic to it.
|
||||||
|
|
||||||
|
2. **fail2ban has banned the agent's IP.** Check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh administrator@172.18.139.186
|
||||||
|
sudo fail2ban-client status
|
||||||
|
sudo fail2ban-client status sshd 2>/dev/null
|
||||||
|
exit
|
||||||
|
```
|
||||||
|
|
||||||
|
If 92's IP is in the banned list, add the SDP subnet to `ignoreip` in `/etc/fail2ban/jail.local` and `sudo fail2ban-client reload`.
|
||||||
|
|
||||||
|
3. **The kernel's connection tracking has stale state.** Restart it (last resort):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh administrator@172.18.139.186
|
||||||
|
sudo systemctl restart nftables 2>/dev/null
|
||||||
|
sudo systemctl restart firewalld 2>/dev/null
|
||||||
|
exit
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Verify the network path works at all** before debugging firewall rules. From 92, a plain HTTP request to the control plane's port:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh administrator@172.18.136.92
|
||||||
|
curl -v http://172.18.139.186:3452/
|
||||||
|
exit
|
||||||
|
```
|
||||||
|
|
||||||
|
If you get *any* HTTP response (even a Go HTTP 400 for "missing node query") → the path is open and the problem is the WebSocket upgrade. If you get `Connection reset by peer` again → the path is being blocked, look at the iptables/fail2ban angle.
|
||||||
|
|
||||||
|
5. **Verify the WS endpoint works on 186 itself** (rules out the network and confirms the upgrade logic):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh administrator@172.18.139.186
|
||||||
|
curl -i \
|
||||||
|
-H "Connection: Upgrade" -H "Upgrade: websocket" \
|
||||||
|
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
|
||||||
|
-H "Sec-WebSocket-Version: 13" \
|
||||||
|
"http://127.0.0.1:3452/ws/agent?node=micro"
|
||||||
|
exit
|
||||||
|
```
|
||||||
|
|
||||||
|
Should return HTTP 101 Switching Protocols. If it does, the network from 92 is the issue. If it doesn't, the control plane binary has a problem.
|
||||||
|
|||||||
Reference in New Issue
Block a user