22 Commits

Author SHA1 Message Date
Achmad d5d5e5467d DEPLOY.md: two more WS-upgrade tests for the curl-works-but-RST case
When plain HTTP from 92 reaches the control plane but the
WebSocket dial RSTs, test the upgrader on each side:

1. curl from 186 to 127.0.0.1:3452 with WS upgrade headers:
   - 101 → control plane is fine, network is the issue.
   - RST/4xx → control plane is broken.

2. curl from 92 to 186:3452 with WS upgrade headers:
   - 101 → firewall allows WS traffic, agent's client is the issue.
   - RST → some middlebox matches on the Upgrade header.
   - 4xx → control plane rejects the upgrade.
2026-06-24 05:52:04 +00:00
Achmad f3da975eb7 DEPLOY.md: remove the 'agent on 92 can't reach 186:3452' section
The user's micro agent dials ws://172.18.139.186:3452/ws/agent
directly, bypassing nginx. The 'add a /ws/agent nginx proxy
block' workaround was for a different topology where the
agent goes through nginx on 80, which doesn't apply here.

That section was confusing — it suggested an nginx change
the user doesn't need. Delete it.
2026-06-24 05:48:10 +00:00
Achmad b8736f4ac3 DEPLOY.md: also check what's bound to port 3452 when curl works
If something else is also listening on :3452 (a leftover
container, a systemd-managed socket, a proxy from an earlier
session), the kernel can route new connects to it and that
listener may RST. Curl from outside gets a clean response
from the real listener; the agent's WS dial lands on the
stale one and gets RST.

Add ss -tlnp + lsof + systemctl list-sockets to the
diagnostic ladder.
2026-06-24 05:43:53 +00:00
Achmad fc768dbd85 DEPLOY.md: stale WS connection after daemon-reexec is the usual RST cause
When 186's control plane comes up via systemctl daemon-reexec
(the 226/NAMESPACE fix), the listening socket is briefly
dropped and re-created. Any WebSocket connection that was
in flight at that moment gets RST. The agent retries every
2s, but if the dial happened exactly during the reexec
window the agent can sit in a tight RST loop until restarted.

Document the restart-agent-micro as the first thing to try,
and demote the iptables/fail2ban/curl diagnostic steps to
'if the agent still RSTs after restart'.
2026-06-24 05:40:42 +00:00
Achmad 8c598ad69f DEPLOY.md: troubleshooting for agent-micro WS RST
'connection reset by peer' on the WS dial is at the TCP layer,
not the application layer. Almost always a firewall on 186
iptables REJECT, fail2ban ban, or stale conntrack state.

Document the diagnostic ladder: iptables -L INPUT, fail2ban
status, then a plain HTTP curl from 92 to verify the network
path, then a WS upgrade curl on 186 itself to verify the
control plane's upgrader.
2026-06-24 05:36:18 +00:00
Achmad 0569cede43 DEPLOY.md: pre-create ~/SDP/data on 186; create it in step 6a too
The control plane binary will create the data dir on first
start, but doing it before systemd starts the service means
the ReadWritePaths scope has somewhere to point at, and
faster diagnosis if anything else is wrong.
2026-06-24 05:31:51 +00:00
Achmad 92354252e5 DEPLOY.md: troubleshooting for status=226/NAMESPACE failure
The 226/NAMESPACE with 'Failed to set up mount namespacing' error
is misleading — the binary is fine, but systemd can't build the
mount namespace for the service. The binary runs fine when
launched directly as administrator; the bug is in the systemd
manager's runtime state.

Document the diagnostic (run the binary manually) and the two
fixes: systemctl daemon-reexec (recreate /run/systemd) as the
first attempt, reboot as the last resort.
2026-06-24 05:27:59 +00:00
Achmad 1f1ff2f173 DEPLOY.md: drop sudo discussion entirely
The user has made it clear (twice now) that they don't want
sudo advice in the runbook — they can type the password
themselves and don't want a script or sudoers change.

Delete the 'Diagnose sudo' step and the 'Sudo on the company
VMs' reminder step. Sudo is just expected behavior; when
the user runs 'sudo systemctl ...' and gets a prompt, they
type the password. No commentary needed.

Renumber the remaining steps so they're sequential 0-8.
2026-06-24 05:19:25 +00:00
Achmad d11723ee63 DEPLOY.md: drop NOPASSWD advice, document interactive sudo
Company VM — no sudoers changes. Replace the 'set up sudoers
NOPASSWD' step with a brief note that every sudo call will
prompt for the password and the user types it. The 15-minute
sudo timestamp means the user only types it once per shell
session, but they will see the prompt several times across
the deploy as they run multiple sudo commands.

Update the step-1 diagnostic outcomes to point at the new
no-policy-change reality: NOPASSWD or different passwords
both still work, the user just types the right one at each
sudo prompt.
2026-06-24 05:16:52 +00:00
Achmad 1eddef9f65 Add DEPLOY.md: copy-pasteable runbook, no deploy.sh
A hand-typed manual deploy guide. Every step is a single
ssh or scp from the laptop, or a one-shot block of commands
inside the VM. No sshpass, no env-var passwords, no
sudo -S password piping. The user types their passwords
interactively when prompted.

The old deploy.sh had grown into a tangle of -tt / sudo -S
/ PAGER=cat workarounds that hid what was actually happening
and was fragile across systemd versions. The runbook trades
that off for explicit per-step commands that the user can
verify by reading the output.

Troubleshooting section at the bottom covers the four most
likely first-deploy failures: SDP_CP_URL expansion, micro
agent can't reach the control plane, login auth rejection,
and missing runtime images.
2026-06-24 05:14:06 +00:00
Achmad 10ea727f53 deploy.sh: force PAGER=cat to defeat pager over ssh -tt TTY
With -tt allocating a remote PTY, systemctl and journalctl would
sometimes open a pager (less/more) even with --no-pager, leaving
the script blocked until the user hits q or Ctrl-C.

Force PAGER=cat and SYSTEMD_PAGER=cat inside every remote sudo
call and inside the status_block journalctl command. Add
--output=cat to journalctl too as belt-and-suspenders.

Status output is also piped through | head -3 / | head -20 to
guarantee a finite output even if the pager or color escape
handling misbehaves.
2026-06-24 05:03:47 +00:00
Achmad d8e8147919 deploy.sh: pipe sudo password via sudo -S (no TTY prompt needed)
Adds password piping so the script works without a sudoers NOPASSWD
rule, on the assumption that the SSH login password is the same as
the sudo password (common on these VMs).

- ssh -tt now forces a TTY allocation; sudo -S requires one and
  was failing with 'sudo: no tty present' over plain non-interactive
  ssh.
- New run_remote_sudo helper pipes the per-VM password to
  'sudo -S -p ""' so each remote call authenticates without a
  prompt. The empty -p suppresses '[sudo] password for ...' from
  appearing in journal tail output.
- install_unit, restart_unit, and the journalctl call in
  status_block all go through run_remote_sudo. systemctl status
  no longer needs sudo (the unit is owned by administrator and
  status doesn't require root for it).
- If your sudo password differs from the login password, the
  script will silently no-op the install/restart steps. Fix by
  setting the right password via SDP_92_PASS / SDP_186_PASS, or
  add a NOPASSWD rule in /etc/sudoers.d/sdp-deploy and revert
  this change.
2026-06-24 05:00:08 +00:00
Achmad 574e6d207b Slice 2: agents and control plane run under systemd
- systemd/sdp-control-plane.service: plain host process on 186,
  listens on :3452, data dir ~/SDP/data. MemoryMax=512M,
  Restart=always, ReadWritePaths scoped to the data dir.
- systemd/sdp-agent-micro.service: plain host process on 92,
  default SDP_CP_URL=ws://172.18.139.186:3452/ws/agent. Operator
  can drop /etc/default/sdp-agent-micro to override. Depends on
  docker.service so the dockerd is up before the agent starts.
- systemd/sdp-agent-gateway.service: plain host process on 186,
  default SDP_CP_URL=ws://127.0.0.1:3452/ws/agent (loopback since
  both live on the same VM). Same env-file override pattern.
- All three use Type=simple, Restart=always, RestartSec=2s. The
  agents already reconnect on transient network drops, so
  restart-on-crash is the right policy.
- The agents talk to the host dockerd via /var/run/docker.sock to
  spawn the actual service containers (sdp-<repo>). Service
  containers are managed by docker, not systemd — only the
  long-running agents and the control plane are under systemd.
- scripts/deploy.sh: now a one-shot — scp's binaries, dashboard,
  and unit files; systemctl daemon-reload + enable --now + restart
  each service in the right order (control plane first on 186 so
  the gateway agent has something to dial). Prints status + last
  10 journal lines per service so the user can see it came up.
- AGENTS.md, README.md: layout tree updated, deploy section
  rewritten, the systemd units documented alongside the agents
  and control plane.
2026-06-24 04:54:28 +00:00
Achmad f12d4f0b12 Slice 2 (follow-up): add Sessions.User / Revoke for /api/logout and audit-trail attribution
The original auth commit shipped the in-memory session store with
just Issue and Valid. The Slice-2 /api/logout handler and the
audit-trail (user column on each deployment) need:
- User(tok): look up the username for a valid session.
- Revoke(tok): drop a session; used by /api/logout.

Tiny follow-up — kept as its own commit because the rest of the
auth work had already shipped in the parent commit by the time the
dashboard's logout button and the deployment-audit-trail surfaced
the need for these methods.
2026-06-24 04:01:53 +00:00
Achmad 4cab047432 Slice 2: port 3452, nginx sandbox mount, AGENTS.md, docs, deploy script cleanup
- control-plane default listen addr is now :3452 (was :8080). An
  unusual port to avoid collisions on the VM.
- agent-micro and agent-gateway default SDP_CP_URL points at
  ws://localhost:3452/ws/agent. docker-compose.yml updates the
  control plane command, host port mapping, and agent -cp URLs.
- nginx/nginx.conf (the legacy root-mount reference) uses
  127.0.0.1:3452 for the upstream. nginx/sandbox.conf is the new
  deployment config: four location blocks for the /sandbox/credit-card
  mount — _next/static serves cached chunks, /api/ and /ws/ proxy
  to 127.0.0.1:3452, /sandbox/credit-card serves the static
  dashboard with try_files for SPA routing.
- scripts/patch-nginx.sh: deleted. The user configures nginx on 186
  by hand. scripts/deploy.sh no longer calls it.
- AGENTS.md: new file. Documents the build/lint/test commands
  (with the golang:1.24-alpine container — local Go can't fetch
  the toolchain), the wire protocol, the Slice-2 conventions
  (sdp-<repo> container naming, snapshot persistence,
  PreGitReset/AfterStart hooks), the repo-path gotcha, and the
  build-artifacts-in-git rationale.
- dashboard/out: now tracked in git, alongside bin/. The dashboard
  static export is scp'd to 186 on deploy; the VMs have no
  internet so they can't regenerate it. .gitignore comment
  explains this and warns against re-ignoring.
- README.md / REQUIREMENTS.md: status updated to 'Slice 2 done',
  per-feature checklist marked. Erangel repo path corrected to
  /var/www/html/erangel-ocean (was wrongly ~/SDP in earlier docs).
2026-06-24 04:00:49 +00:00
Achmad 78872de897 Slice 2: dashboard — nav, sandboxes/templates/environments/history pages, basePath
- New /dashboard layout with a top nav (Quick Deploy / Sandboxes /
  Templates / Environments / History) and a Logout button that
  invalidates the session.
- Quick Deploy: stage list switches per repo (Go vs PHP, so the
  composer-install stage is shown for the gateway), env-var textarea,
  host-port input.
- Sandboxes: list, create, clone-from-template, delete.
- Sandbox detail: live <key>_url map from the gateway's config.php,
  per-route toggle (OCP / sandbox override with a URL input),
  microservice deploys with per-service host port and env, branch
  picker.
- Templates / Environments: list + create + delete.
- History: filterable deployment list with state badges.
- Sandbox detail page is a server component with generateStaticParams
  that delegates to a client component; required for the static export.
- API client: prefix all /api and /ws URLs with NEXT_PUBLIC_BASE_PATH
  (set in next.config.js) so the dashboard works under a non-root
  basePath.
- next.config.js: basePath and assetPrefix set to /sandbox/credit-card
  so asset URLs and internal Link hrefs resolve under the sub-path.
  NEXT_PUBLIC_BASE_PATH env is exposed to the browser bundle for the
  fetch() prefix.
2026-06-24 03:59:13 +00:00
Achmad a7df9ffc6c Slice 2: sandbox, template, environment, route CRUD
- store: add tables and CRUD for sandboxes (with services), templates
  (with services, clone-into-sandbox), environments (named key/value
  sets), and routes (per-sandbox <service>_url overrides).
- api: split into one file per resource. handleSandboxes/handleSandboxByID
  covers CRUD + 'clone from template' + 'deploy one service in a sandbox'
  (which merges the sandbox's env into the request, picks the port,
  and dispatches the deploy frame to the right node).
  handleTemplates/handleTemplateByID, handleEnvironments/handleEnvironmentByID,
  handlePushRoutes cover the rest. The control plane's repo->node
  resolution still lives in resolveNode (api-gateway -> gateway,
  everything else -> micro).
2026-06-24 03:59:02 +00:00
Achmad 55d7705c63 Slice 2: real auth, agent-mediated repo/branch listing, deployment list from SQLite
- protocol: add RepoInfo, RouteOverride; add HostPort, SandboxID to DeployRequest.
- ws hub: add CallAgent for sync request/response RPCs over the agent WS,
  and DeliverAgentReply to route {op:reply} frames back to the caller.
  UnregisterAgent now also fails any pending RPCs so callers don't hang.
- agent-micro: new op handlers list_repos, list_branches, probe.
  Wire protocol.Event frames use json.RawMessage so each op decodes
  its own data shape.
- agent-gateway: same op handlers (list_repos/list_branches/probe) plus
  push_routes, which the gateway uses to rewrite the api-gateway
  config.php. Detailed in a later commit.
- control-plane login: validateViaAgent now calls CallAgent('probe')
  against the gateway agent (git ls-remote), replacing the
  accept-any-creds stub.
- control-plane repos: handleListRepos and handleListBranches forward
  to the agents via list_repos / list_branches RPCs, replacing the
  hardcoded fixtures.
- control-plane deployments: split into its own file. handleListDeployments
  reads from SQLite (was hardcoded []). handleCreateDeployment now
  supports sandbox-scoped deploys with a host port + env merge.
  handleStopDeployment looks up the node from the deployment row.
- store: split into store.go + deployments.go. The Deployment type
  adds sandboxId, containerId, hostPort. StartDeploymentInSandbox,
  SetContainerID, ListDeployments, GetDeployment, LatestDeploymentBySandboxService
  are new.
- store_test.go: round-trips every Slice-2 path (env, sandbox,
  template, clone, routes, deployment).
- .gitignore: track bin/ — the build runs on a separate Linux box
  with the golang:1.24 toolchain, and the binaries are SCPed from
  there to the company VMs (92 / 186). The VMs have no internet.
- Tracked bin/{control-plane,agent-micro,agent-gateway}.
2026-06-24 03:58:53 +00:00
opencode 2bc3ff73a2 Slice 1: build green, MVP core flow
- New agentlib module (gitutil + deployer with NewGo / NewPHP) replaces
  agent-micro/internal so both agents can share it (Go's internal/ rule
  was blocking agent-gateway from importing agent-micro's packages).
- Migrate agents from legacy github.com/docker/docker/client to the
  current github.com/moby/moby/client v0.5.0 / moby/moby/api v1.55.0.
- Fix compile errors in the original committed code: missing
  gorilla/websocket import in control-plane/internal/ws/handlers.go,
  unaliased dockerclient reference, wrong SQLite driver name
  (sqlite3 -> sqlite), Dialer.Dial 3-return-value mismatch.
- scripts/build.sh: Go 1.23 -> 1.24, apk add git, safe.directory for
  bind-mounted host tree, chmod inside container (host can't chmod
  files owned by container root).
- README and REQUIREMENTS updated to reflect the actual architecture
  (Go + SQLite, no Spring Boot, moby SDK, per-deploy no image build)
  with a per-feature status checklist at the end of REQUIREMENTS.
2026-06-24 01:43:43 +00:00
Achmad Setyabudi Susilo 7c1013e083 Rewrite README to match current state
Build now uses scripts/build.sh (Docker cross-compile, no Go install
needed). Add Prereqs, docker-compose dev section, Architecture notes,
and a list of intentional MVP stubs so reviewers know what's still
scaffolded vs what's real.
2026-06-24 07:41:51 +07:00
Achmad Setyabudi Susilo ba8a3360cc Drop GOFLAGS=-mod=mod; workspace mode forces -mod=readonly
The go.work file enables workspace mode, which only allows -mod=readonly
or -mod=vendor. -mod=mod fails the build with:

  go: -mod may only be set to readonly or vendor when in workspace mode

Drop the GOFLAGS line and let workspace mode pick the default
(readonly). Update go.work.sum to track module checksums.
2026-06-24 07:37:39 +07:00
Achmad Setyabudi Susilo 3d99940658 Initial SDP skeleton
Sandbox Deployment Platform — Go control plane + agents, NextJS dashboard,
nginx reverse proxy. Cross-compile via Docker; deploy via sshpass to
172.18.136.92 (micro) and 172.18.139.186 (gateway).

- control-plane: HTTP API, WS hub, SQLite (modernc.org/sqlite) for
  progress, .log files for log persistence
- agent-micro / agent-gateway: alpine:3.20 + bind-mounted repo,
  binary exec'd in container, no Dockerfile build step
- dashboard: NextJS static export + shadcn/ui components, single
  WebSocket hook
- docker-compose.yml: three services on alpine:latest with docker
  socket bind for agents
- scripts/: build.sh (golang:1.23-alpine cross-compile), deploy.sh,
  patch-nginx.sh (idempotent nginx splice), ssh wrappers

Runtime model: pass-through Bitbucket creds per deploy, never logged or
persisted on the agent. Control plane never touches git or docker
directly — agents do all the work locally.
2026-06-24 07:25:01 +07:00