bri-sandbox-development-platform

Author	SHA1	Message	Date
Achmad	d5d5e5467d	DEPLOY.md: two more WS-upgrade tests for the curl-works-but-RST case When plain HTTP from 92 reaches the control plane but the WebSocket dial RSTs, test the upgrader on each side: 1. curl from 186 to 127.0.0.1:3452 with WS upgrade headers: - 101 → control plane is fine, network is the issue. - RST/4xx → control plane is broken. 2. curl from 92 to 186:3452 with WS upgrade headers: - 101 → firewall allows WS traffic, agent's client is the issue. - RST → some middlebox matches on the Upgrade header. - 4xx → control plane rejects the upgrade.	2026-06-24 05:52:04 +00:00
Achmad	f3da975eb7	DEPLOY.md: remove the 'agent on 92 can't reach 186:3452' section The user's micro agent dials ws://172.18.139.186:3452/ws/agent directly, bypassing nginx. The 'add a /ws/agent nginx proxy block' workaround was for a different topology where the agent goes through nginx on 80, which doesn't apply here. That section was confusing — it suggested an nginx change the user doesn't need. Delete it.	2026-06-24 05:48:10 +00:00
Achmad	b8736f4ac3	DEPLOY.md: also check what's bound to port 3452 when curl works If something else is also listening on :3452 (a leftover container, a systemd-managed socket, a proxy from an earlier session), the kernel can route new connects to it and that listener may RST. Curl from outside gets a clean response from the real listener; the agent's WS dial lands on the stale one and gets RST. Add ss -tlnp + lsof + systemctl list-sockets to the diagnostic ladder.	2026-06-24 05:43:53 +00:00
Achmad	fc768dbd85	DEPLOY.md: stale WS connection after daemon-reexec is the usual RST cause When 186's control plane comes up via systemctl daemon-reexec (the 226/NAMESPACE fix), the listening socket is briefly dropped and re-created. Any WebSocket connection that was in flight at that moment gets RST. The agent retries every 2s, but if the dial happened exactly during the reexec window the agent can sit in a tight RST loop until restarted. Document the restart-agent-micro as the first thing to try, and demote the iptables/fail2ban/curl diagnostic steps to 'if the agent still RSTs after restart'.	2026-06-24 05:40:42 +00:00
Achmad	8c598ad69f	DEPLOY.md: troubleshooting for agent-micro WS RST 'connection reset by peer' on the WS dial is at the TCP layer, not the application layer. Almost always a firewall on 186 iptables REJECT, fail2ban ban, or stale conntrack state. Document the diagnostic ladder: iptables -L INPUT, fail2ban status, then a plain HTTP curl from 92 to verify the network path, then a WS upgrade curl on 186 itself to verify the control plane's upgrader.	2026-06-24 05:36:18 +00:00
Achmad	0569cede43	DEPLOY.md: pre-create ~/SDP/data on 186; create it in step 6a too The control plane binary will create the data dir on first start, but doing it before systemd starts the service means the ReadWritePaths scope has somewhere to point at, and faster diagnosis if anything else is wrong.	2026-06-24 05:31:51 +00:00
Achmad	92354252e5	DEPLOY.md: troubleshooting for status=226/NAMESPACE failure The 226/NAMESPACE with 'Failed to set up mount namespacing' error is misleading — the binary is fine, but systemd can't build the mount namespace for the service. The binary runs fine when launched directly as administrator; the bug is in the systemd manager's runtime state. Document the diagnostic (run the binary manually) and the two fixes: systemctl daemon-reexec (recreate /run/systemd) as the first attempt, reboot as the last resort.	2026-06-24 05:27:59 +00:00
Achmad	1f1ff2f173	DEPLOY.md: drop sudo discussion entirely The user has made it clear (twice now) that they don't want sudo advice in the runbook — they can type the password themselves and don't want a script or sudoers change. Delete the 'Diagnose sudo' step and the 'Sudo on the company VMs' reminder step. Sudo is just expected behavior; when the user runs 'sudo systemctl ...' and gets a prompt, they type the password. No commentary needed. Renumber the remaining steps so they're sequential 0-8.	2026-06-24 05:19:25 +00:00
Achmad	d11723ee63	DEPLOY.md: drop NOPASSWD advice, document interactive sudo Company VM — no sudoers changes. Replace the 'set up sudoers NOPASSWD' step with a brief note that every sudo call will prompt for the password and the user types it. The 15-minute sudo timestamp means the user only types it once per shell session, but they will see the prompt several times across the deploy as they run multiple sudo commands. Update the step-1 diagnostic outcomes to point at the new no-policy-change reality: NOPASSWD or different passwords both still work, the user just types the right one at each sudo prompt.	2026-06-24 05:16:52 +00:00
Achmad	1eddef9f65	Add DEPLOY.md: copy-pasteable runbook, no deploy.sh A hand-typed manual deploy guide. Every step is a single ssh or scp from the laptop, or a one-shot block of commands inside the VM. No sshpass, no env-var passwords, no sudo -S password piping. The user types their passwords interactively when prompted. The old deploy.sh had grown into a tangle of -tt / sudo -S / PAGER=cat workarounds that hid what was actually happening and was fragile across systemd versions. The runbook trades that off for explicit per-step commands that the user can verify by reading the output. Troubleshooting section at the bottom covers the four most likely first-deploy failures: SDP_CP_URL expansion, micro agent can't reach the control plane, login auth rejection, and missing runtime images.	2026-06-24 05:14:06 +00:00
Achmad	10ea727f53	deploy.sh: force PAGER=cat to defeat pager over ssh -tt TTY With -tt allocating a remote PTY, systemctl and journalctl would sometimes open a pager (less/more) even with --no-pager, leaving the script blocked until the user hits q or Ctrl-C. Force PAGER=cat and SYSTEMD_PAGER=cat inside every remote sudo call and inside the status_block journalctl command. Add --output=cat to journalctl too as belt-and-suspenders. Status output is also piped through \| head -3 / \| head -20 to guarantee a finite output even if the pager or color escape handling misbehaves.	2026-06-24 05:03:47 +00:00
Achmad	d8e8147919	deploy.sh: pipe sudo password via sudo -S (no TTY prompt needed) Adds password piping so the script works without a sudoers NOPASSWD rule, on the assumption that the SSH login password is the same as the sudo password (common on these VMs). - ssh -tt now forces a TTY allocation; sudo -S requires one and was failing with 'sudo: no tty present' over plain non-interactive ssh. - New run_remote_sudo helper pipes the per-VM password to 'sudo -S -p ""' so each remote call authenticates without a prompt. The empty -p suppresses '[sudo] password for ...' from appearing in journal tail output. - install_unit, restart_unit, and the journalctl call in status_block all go through run_remote_sudo. systemctl status no longer needs sudo (the unit is owned by administrator and status doesn't require root for it). - If your sudo password differs from the login password, the script will silently no-op the install/restart steps. Fix by setting the right password via SDP_92_PASS / SDP_186_PASS, or add a NOPASSWD rule in /etc/sudoers.d/sdp-deploy and revert this change.	2026-06-24 05:00:08 +00:00
Achmad	574e6d207b	Slice 2: agents and control plane run under systemd - systemd/sdp-control-plane.service: plain host process on 186, listens on :3452, data dir ~/SDP/data. MemoryMax=512M, Restart=always, ReadWritePaths scoped to the data dir. - systemd/sdp-agent-micro.service: plain host process on 92, default SDP_CP_URL=ws://172.18.139.186:3452/ws/agent. Operator can drop /etc/default/sdp-agent-micro to override. Depends on docker.service so the dockerd is up before the agent starts. - systemd/sdp-agent-gateway.service: plain host process on 186, default SDP_CP_URL=ws://127.0.0.1:3452/ws/agent (loopback since both live on the same VM). Same env-file override pattern. - All three use Type=simple, Restart=always, RestartSec=2s. The agents already reconnect on transient network drops, so restart-on-crash is the right policy. - The agents talk to the host dockerd via /var/run/docker.sock to spawn the actual service containers (sdp-<repo>). Service containers are managed by docker, not systemd — only the long-running agents and the control plane are under systemd. - scripts/deploy.sh: now a one-shot — scp's binaries, dashboard, and unit files; systemctl daemon-reload + enable --now + restart each service in the right order (control plane first on 186 so the gateway agent has something to dial). Prints status + last 10 journal lines per service so the user can see it came up. - AGENTS.md, README.md: layout tree updated, deploy section rewritten, the systemd units documented alongside the agents and control plane.	2026-06-24 04:54:28 +00:00
Achmad	f12d4f0b12	Slice 2 (follow-up): add Sessions.User / Revoke for /api/logout and audit-trail attribution The original auth commit shipped the in-memory session store with just Issue and Valid. The Slice-2 /api/logout handler and the audit-trail (user column on each deployment) need: - User(tok): look up the username for a valid session. - Revoke(tok): drop a session; used by /api/logout. Tiny follow-up — kept as its own commit because the rest of the auth work had already shipped in the parent commit by the time the dashboard's logout button and the deployment-audit-trail surfaced the need for these methods.	2026-06-24 04:01:53 +00:00
Achmad	4cab047432	Slice 2: port 3452, nginx sandbox mount, AGENTS.md, docs, deploy script cleanup - control-plane default listen addr is now :3452 (was :8080). An unusual port to avoid collisions on the VM. - agent-micro and agent-gateway default SDP_CP_URL points at ws://localhost:3452/ws/agent. docker-compose.yml updates the control plane command, host port mapping, and agent -cp URLs. - nginx/nginx.conf (the legacy root-mount reference) uses 127.0.0.1:3452 for the upstream. nginx/sandbox.conf is the new deployment config: four location blocks for the /sandbox/credit-card mount — _next/static serves cached chunks, /api/ and /ws/ proxy to 127.0.0.1:3452, /sandbox/credit-card serves the static dashboard with try_files for SPA routing. - scripts/patch-nginx.sh: deleted. The user configures nginx on 186 by hand. scripts/deploy.sh no longer calls it. - AGENTS.md: new file. Documents the build/lint/test commands (with the golang:1.24-alpine container — local Go can't fetch the toolchain), the wire protocol, the Slice-2 conventions (sdp-<repo> container naming, snapshot persistence, PreGitReset/AfterStart hooks), the repo-path gotcha, and the build-artifacts-in-git rationale. - dashboard/out: now tracked in git, alongside bin/. The dashboard static export is scp'd to 186 on deploy; the VMs have no internet so they can't regenerate it. .gitignore comment explains this and warns against re-ignoring. - README.md / REQUIREMENTS.md: status updated to 'Slice 2 done', per-feature checklist marked. Erangel repo path corrected to /var/www/html/erangel-ocean (was wrongly ~/SDP in earlier docs).	2026-06-24 04:00:49 +00:00
Achmad	78872de897	Slice 2: dashboard — nav, sandboxes/templates/environments/history pages, basePath - New /dashboard layout with a top nav (Quick Deploy / Sandboxes / Templates / Environments / History) and a Logout button that invalidates the session. - Quick Deploy: stage list switches per repo (Go vs PHP, so the composer-install stage is shown for the gateway), env-var textarea, host-port input. - Sandboxes: list, create, clone-from-template, delete. - Sandbox detail: live <key>_url map from the gateway's config.php, per-route toggle (OCP / sandbox override with a URL input), microservice deploys with per-service host port and env, branch picker. - Templates / Environments: list + create + delete. - History: filterable deployment list with state badges. - Sandbox detail page is a server component with generateStaticParams that delegates to a client component; required for the static export. - API client: prefix all /api and /ws URLs with NEXT_PUBLIC_BASE_PATH (set in next.config.js) so the dashboard works under a non-root basePath. - next.config.js: basePath and assetPrefix set to /sandbox/credit-card so asset URLs and internal Link hrefs resolve under the sub-path. NEXT_PUBLIC_BASE_PATH env is exposed to the browser bundle for the fetch() prefix.	2026-06-24 03:59:13 +00:00
Achmad	a7df9ffc6c	Slice 2: sandbox, template, environment, route CRUD - store: add tables and CRUD for sandboxes (with services), templates (with services, clone-into-sandbox), environments (named key/value sets), and routes (per-sandbox <service>_url overrides). - api: split into one file per resource. handleSandboxes/handleSandboxByID covers CRUD + 'clone from template' + 'deploy one service in a sandbox' (which merges the sandbox's env into the request, picks the port, and dispatches the deploy frame to the right node). handleTemplates/handleTemplateByID, handleEnvironments/handleEnvironmentByID, handlePushRoutes cover the rest. The control plane's repo->node resolution still lives in resolveNode (api-gateway -> gateway, everything else -> micro).	2026-06-24 03:59:02 +00:00
Achmad	55d7705c63	Slice 2: real auth, agent-mediated repo/branch listing, deployment list from SQLite - protocol: add RepoInfo, RouteOverride; add HostPort, SandboxID to DeployRequest. - ws hub: add CallAgent for sync request/response RPCs over the agent WS, and DeliverAgentReply to route {op:reply} frames back to the caller. UnregisterAgent now also fails any pending RPCs so callers don't hang. - agent-micro: new op handlers list_repos, list_branches, probe. Wire protocol.Event frames use json.RawMessage so each op decodes its own data shape. - agent-gateway: same op handlers (list_repos/list_branches/probe) plus push_routes, which the gateway uses to rewrite the api-gateway config.php. Detailed in a later commit. - control-plane login: validateViaAgent now calls CallAgent('probe') against the gateway agent (git ls-remote), replacing the accept-any-creds stub. - control-plane repos: handleListRepos and handleListBranches forward to the agents via list_repos / list_branches RPCs, replacing the hardcoded fixtures. - control-plane deployments: split into its own file. handleListDeployments reads from SQLite (was hardcoded []). handleCreateDeployment now supports sandbox-scoped deploys with a host port + env merge. handleStopDeployment looks up the node from the deployment row. - store: split into store.go + deployments.go. The Deployment type adds sandboxId, containerId, hostPort. StartDeploymentInSandbox, SetContainerID, ListDeployments, GetDeployment, LatestDeploymentBySandboxService are new. - store_test.go: round-trips every Slice-2 path (env, sandbox, template, clone, routes, deployment). - .gitignore: track bin/ — the build runs on a separate Linux box with the golang:1.24 toolchain, and the binaries are SCPed from there to the company VMs (92 / 186). The VMs have no internet. - Tracked bin/{control-plane,agent-micro,agent-gateway}.	2026-06-24 03:58:53 +00:00
opencode	2bc3ff73a2	Slice 1: build green, MVP core flow - New agentlib module (gitutil + deployer with NewGo / NewPHP) replaces agent-micro/internal so both agents can share it (Go's internal/ rule was blocking agent-gateway from importing agent-micro's packages). - Migrate agents from legacy github.com/docker/docker/client to the current github.com/moby/moby/client v0.5.0 / moby/moby/api v1.55.0. - Fix compile errors in the original committed code: missing gorilla/websocket import in control-plane/internal/ws/handlers.go, unaliased dockerclient reference, wrong SQLite driver name (sqlite3 -> sqlite), Dialer.Dial 3-return-value mismatch. - scripts/build.sh: Go 1.23 -> 1.24, apk add git, safe.directory for bind-mounted host tree, chmod inside container (host can't chmod files owned by container root). - README and REQUIREMENTS updated to reflect the actual architecture (Go + SQLite, no Spring Boot, moby SDK, per-deploy no image build) with a per-feature status checklist at the end of REQUIREMENTS.	2026-06-24 01:43:43 +00:00
Achmad Setyabudi Susilo	7c1013e083	Rewrite README to match current state Build now uses scripts/build.sh (Docker cross-compile, no Go install needed). Add Prereqs, docker-compose dev section, Architecture notes, and a list of intentional MVP stubs so reviewers know what's still scaffolded vs what's real.	2026-06-24 07:41:51 +07:00
Achmad Setyabudi Susilo	ba8a3360cc	Drop GOFLAGS=-mod=mod; workspace mode forces -mod=readonly The go.work file enables workspace mode, which only allows -mod=readonly or -mod=vendor. -mod=mod fails the build with: go: -mod may only be set to readonly or vendor when in workspace mode Drop the GOFLAGS line and let workspace mode pick the default (readonly). Update go.work.sum to track module checksums.	2026-06-24 07:37:39 +07:00
Achmad Setyabudi Susilo	3d99940658	Initial SDP skeleton Sandbox Deployment Platform — Go control plane + agents, NextJS dashboard, nginx reverse proxy. Cross-compile via Docker; deploy via sshpass to 172.18.136.92 (micro) and 172.18.139.186 (gateway). - control-plane: HTTP API, WS hub, SQLite (modernc.org/sqlite) for progress, .log files for log persistence - agent-micro / agent-gateway: alpine:3.20 + bind-mounted repo, binary exec'd in container, no Dockerfile build step - dashboard: NextJS static export + shadcn/ui components, single WebSocket hook - docker-compose.yml: three services on alpine:latest with docker socket bind for agents - scripts/: build.sh (golang:1.23-alpine cross-compile), deploy.sh, patch-nginx.sh (idempotent nginx splice), ssh wrappers Runtime model: pass-through Bitbucket creds per deploy, never logged or persisted on the agent. Control plane never touches git or docker directly — agents do all the work locally.	2026-06-24 07:25:01 +07:00

22 Commits