# Sandbox Deployment Platform (SDP) ## Tech Stack (Decided) - **Dashboard:** NextJS + React + TypeScript + Tailwind. Plain `useState` + single WebSocket hook. No Redux/Zustand. Built as static output, served by nginx with `try_files`. - **Control Plane:** Go. PostgreSQL for metadata (nodes, repos, deployments, sandboxes, templates, routes, envs). **SQLite** for ephemeral state (deployment progress snapshots) and **`.log` files** for log persistence. No Spring Boot. No Redis. - **Agents:** Go. Use the official Docker SDK (`github.com/docker/docker/client`) for container orchestration. Build Go binaries **directly on the host** (`go build -o {name}`) — no Dockerfile-based build step. - **Realtime transport:** WebSocket end-to-end (Agent → Control Plane → Frontend). - **Auth:** Bitbucket username/password. Validated by a real `git ls-remote`/`fetch` via the Agent. **Credentials are passed on every operation from Control Plane to Agent. Never logged, never persisted on the Agent longer than the operation.** - **Infra in the spec** = the existing microservice infrastructure (172.18.* VMs, AppGolang, SDP repo), not infrastructure for SDP itself. ## Overview Sandbox Deployment Platform (SDP) is an internal deployment platform that allows Backend and QA teams to deploy isolated feature branches without requiring deployment to the shared OpenShift (OCP) environment. The platform is designed specifically for the company's existing architecture: * Golang microservices * PHP API Gateway * Internal VM infrastructure * Bitbucket repositories * No internet access on deployment VMs * Developers only have read access to OCP The platform is NOT intended to be a generic Kubernetes, OpenShift, or PaaS solution. --- # Problem Statement Current workflow: 1. Developer creates feature branch. 2. Deployment to shared environment requires PR approval and merge. 3. CI/CD deploys to shared OCP. 4. Testing affects other teams. 5. Negative-path testing can disrupt shared development. Required workflow: 1. Developer deploys feature branch directly. 2. Deployment occurs in isolated sandbox infrastructure. 3. API Gateway selectively routes traffic to sandbox services. 4. Remaining services continue using OCP. 5. QA can test independently. --- # Infrastructure ## Microservices VM ```text IP Address: 172.18.136.92 Repository Root: ~/AppGolang ``` Example: ```text ~/AppGolang ├── account ├── payment ├── user ├── notification └── ... ``` All Golang microservices reside here. --- ## Infrastructure VM ```text IP Address: 172.18.136.93 ``` Reserved for future use: * PostgreSQL * Redis * RabbitMQ * Kafka Not required for MVP. --- ## API Gateway VM ```text IP Address: 172.18.139.186 Repository Root: ~/SDP ``` Contains: ```text ~/SDP ``` The API Gateway repository. --- # High-Level Architecture ```text +--------------------------+ | Dashboard | | NextJS Frontend | +------------+-------------+ | v +--------------------------+ | Control Plane | | Spring Boot | +------+------------+------+ | | | HTTP | HTTP | | v v +-------------+ +-------------+ | Micro Agent | | Gateway | | 172.18.136.92 | | Agent | | | | 172.18.139.186 | +-------------+ +-------------+ ``` --- # Architectural Principles ## Control Plane The Control Plane: * Never SSHs into servers * Never executes build commands * Never accesses repositories directly The Control Plane only: * Stores metadata * Manages deployments * Sends commands via HTTP * Receives deployment events * Streams logs to frontend --- ## Agents Agents execute all operations locally. Examples: ```text git fetch git checkout go build docker build docker run ``` Agents have direct filesystem access. --- # Authentication ## Login Users authenticate using: ```text Bitbucket Username Bitbucket Password ``` --- ## Validation Authentication is validated by attempting a Git operation against a known repository. Example: ```bash git ls-remote ``` or ```bash git fetch ``` If Git authentication succeeds: ```text LOGIN SUCCESS ``` Otherwise: ```text LOGIN FAILED ``` --- ## Git Operations All Git operations must use the currently authenticated user's credentials. Examples: ```bash git fetch git pull git checkout ``` Credentials are passed from Control Plane to Agent during deployment execution. Credentials must never be logged. --- # Repository Configuration Repositories are configured manually on each Agent. No automatic discovery. Example: ```yaml repositories: - name: account path: /home/user/AppGolang/account - name: payment path: /home/user/AppGolang/payment - name: user path: /home/user/AppGolang/user ``` Gateway: ```yaml repositories: - name: api-gateway path: /home/user/SDP ``` --- # Core Concepts ## Node Represents a VM. Fields: ```text id name ipAddress type ``` Types: ```text MICRO GATEWAY INFRA ``` --- ## Repository Fields: ```text id name path nodeId ``` --- ## Environment Equivalent to: ```text ConfigMap Secret ``` Contains: ```text Variables Secrets Files ``` Example: ```env DB_HOST= DB_PORT= REDIS_URL= JWT_SECRET= ``` --- ## Deployment Represents a deployment execution. Fields: ```text id repository branch user status logs startedAt completedAt ``` --- ## Sandbox Represents an isolated testing environment. Example: ```yaml sandbox: QA-LOGIN-ERROR services: account: branch: feature/login-error payment: use_ocp: true user: use_ocp: true ``` --- ## Sandbox Template A reusable sandbox configuration. Purpose: Reduce repetitive setup. Example: ```yaml template: QA-DEFAULT gateway: branch: develop services: account: use_ocp: true payment: use_ocp: true user: use_ocp: true ``` Another example: ```yaml template: ACCOUNT-TESTING gateway: branch: develop services: account: branch: feature/account payment: use_ocp: true user: use_ocp: true ``` Users can: * Create template * Update template * Clone template into sandbox --- # Micro Agent Requirements Runs on: ```text 172.18.136.92 ``` Responsibilities: ```text List repositories List branches Fetch repository updates Checkout branch Pull latest changes Build Go binary Create Docker image Run container Restart container Stop container Stream logs ``` --- # Microservice Deployment Process Given: ```text Repository: account Branch: feature/login-error ``` Agent executes: ```bash git fetch git checkout feature/login-error git pull ``` Then: ```bash go build -o app ``` Then generates runtime image: ```dockerfile FROM alpine:latest COPY app /app CMD ["/app"] ``` Then: ```bash docker build docker run ``` --- # Gateway Agent Requirements Runs on: ```text 172.18.139.186 ``` Responsibilities: ```text List branches Fetch repository updates Checkout branch Pull latest changes Build container Deploy container Restart container Manage routing Stream logs ``` --- # API Gateway Deployment The API Gateway must run inside Docker. It is no longer deployed directly on the host. Deployment process: ```bash git fetch git checkout git pull docker build docker run ``` --- # Offline VM Requirements Deployment VMs have no internet access. The following cannot be relied upon: ```bash docker pull ``` --- # Docker Image Distribution Images must be imported manually. Example: On machine with internet: ```bash docker pull nginx:latest docker save nginx:latest -o nginx.tar ``` Transfer: ```bash scp nginx.tar user@172.18.139.186:/tmp ``` Load: ```bash docker load -i nginx.tar ``` --- # Environment Management Users must be able to: ```text Create Environment Update Environment Delete Environment Manage Secrets Manage Variables ``` Example: ```env DB_HOST=... DB_USER=... DB_PASSWORD=... ``` Environment values are injected during deployment. --- # Route Override System Most important feature. Each route can target either: ```text Sandbox Deployment OCP Deployment ``` Example: ```yaml account: target: http://172.18.136.92:9001 payment: target: https://payment-dev.company.com user: target: https://user-dev.company.com ``` Result: ```text account -> sandbox payment -> OCP user -> OCP ``` --- # Mobile App Integration Current mobile app: ```text https://project-dev-url.domain.com ``` Target: ```text http://172.18.139.186:{PORT} ``` Example: ```text http://172.18.139.186:8080 ``` QA can point the mobile application directly to the API Gateway sandbox. No DNS changes required. --- # Port Management Gateway Ports: ```text 8080 8081 8082 ... ``` Microservice Ports: ```text 9001 9002 9003 ... ``` Control Plane must: * Allocate ports * Track ports * Prevent conflicts --- # Deployment States Supported states: ```text QUEUED FETCHING CHECKOUT BUILDING CREATING_IMAGE STARTING_CONTAINER RUNNING FAILED STOPPED ``` --- # Real-Time Progress Frontend must receive deployment progress in real time. Example: ```text ✓ Fetch ✓ Checkout ✓ Build ✓ Create Image ✓ Start Container ✓ Running ``` No page refresh. --- # Real-Time Logs Frontend must receive logs while deployment is running. Example: ```text [FETCH] Fetching origin... [FETCH] Success [BUILD] Running go build [BUILD] Success [DEPLOY] Container started ``` --- # Event Streaming Agents emit events. Examples: ```text FETCH_STARTED FETCH_COMPLETED CHECKOUT_STARTED CHECKOUT_COMPLETED BUILD_STARTED BUILD_COMPLETED DEPLOY_STARTED DEPLOY_COMPLETED DEPLOY_FAILED ``` Architecture: ```text Agent -> SSE/WebSocket Control Plane -> WebSocket Frontend ``` --- # Dashboard Features ## Authentication ```text Login Logout ``` --- ## Repository Management ```text List Repositories List Branches ``` --- ## Deployments ```text Deploy Branch Restart Deployment Stop Deployment Delete Deployment ``` --- ## Deployment Monitoring ```text View Progress View Logs View Status View History ``` --- ## Environment Management ```text Create Environment Update Environment Delete Environment ``` --- ## Sandbox Management ```text Create Sandbox Update Sandbox Delete Sandbox Clone Sandbox ``` --- ## Template Management ```text Create Template Update Template Delete Template Create Sandbox From Template ``` --- ## Route Management ```text Route To Sandbox Route To OCP ``` --- # Audit Trail Store: ```text User Repository Branch Environment Sandbox Timestamp Status ``` Example: ```text User: Achmad Repository: account Branch: feature/login-error Sandbox: QA-LOGIN-ERROR Status: SUCCESS ``` --- # Technology Stack ## Dashboard ```text NextJS React TypeScript Tailwind ``` ## Control Plane ```text Spring Boot PostgreSQL WebSocket ``` ## Agents Preferred: ```text Go ``` Alternative: ```text Spring Boot ``` --- # Non-Goals Not intended to replace: ```text Kubernetes OpenShift Rancher ArgoCD Coolify ``` Not intended to support: ```text Multi-Tenant SaaS Public Cloud Generic Container Hosting ``` Purpose: Provide isolated deployment environments for Backend and QA teams. --- # MVP Success Criteria A developer can: 1. Login using Bitbucket username and password. 2. Select a repository. 3. Select a branch. 4. Configure environment variables. 5. Deploy API Gateway. 6. Deploy microservices. 7. Watch deployment progress in real time. 8. Watch deployment logs in real time. 9. Create sandboxes. 10. Create sandbox templates. 11. Route selected services to sandbox deployments. 12. Route remaining services to OCP. 13. Point mobile application to: ```text http://172.18.139.186:{PORT} ``` 14. Allow QA to test isolated feature branches without impacting shared OCP environments. # Future Enhancements ## Sandbox Isolation Strategy ### Goal Allow multiple developers and QA engineers to run independent sandboxes simultaneously without conflicts. Example: ```text Achmad Sandbox ├── account ├── payment └── gateway QA Sandbox ├── account ├── payment └── gateway ``` Both sandboxes must coexist on the same infrastructure. --- ## Container Naming Convention Containers should follow a predictable naming pattern. Format: ```text sandbox-{sandbox-name}-{service-name} ``` Examples: ```text sandbox-achmad-account sandbox-achmad-payment sandbox-achmad-user sandbox-achmad-gateway ``` ```text sandbox-qa-login-account sandbox-qa-login-gateway ``` Benefits: * Easier troubleshooting * Easier cleanup * Easier log inspection * Easier monitoring --- ## Docker Network Per Sandbox Each sandbox should have its own Docker network. Format: ```text sandbox-{sandbox-name} ``` Examples: ```text sandbox-achmad sandbox-qa-login sandbox-regression ``` Container example: ```text Network: sandbox-achmad Containers: sandbox-achmad-gateway sandbox-achmad-account sandbox-achmad-payment ``` Benefits: * Network isolation * Service discovery * No cross-sandbox traffic * Simpler routing --- ## Internal Service Communication Services within a sandbox should communicate through Docker DNS. Example: Instead of: ```text http://172.18.136.92:9001 ``` Use: ```text http://sandbox-achmad-account:8080 ``` Benefits: * No dependency on host ports * Cleaner configuration * Easier sandbox replication --- ## Sandbox Port Allocation Gateway containers should expose a unique external port. Examples: ```text sandbox-achmad-gateway → 172.18.139.186:8080 sandbox-qa-login-gateway → 172.18.139.186:8081 sandbox-regression-gateway → 172.18.139.186:8082 ``` Mobile applications connect only to gateway ports. --- ## Automatic Port Management Control Plane should automatically: * Allocate available ports * Reserve ports * Release ports when sandbox is deleted Example database table: ```text PortAllocation ├── sandboxId ├── serviceName ├── port └── allocatedAt ``` --- ## Sandbox Lifecycle Future support: ### Suspend Sandbox Stops all containers while preserving configuration. Example: ```text ACTIVE ↓ SUSPENDED ``` Resources freed: * CPU * Memory Configuration preserved. --- ### Resume Sandbox Restarts previously suspended sandbox. Example: ```text SUSPENDED ↓ ACTIVE ``` --- ### Sandbox Expiration Automatic cleanup after inactivity. Example: ```text No activity for 14 days ↓ Mark Expired ↓ Stop Containers ↓ Delete After Retention Period ``` Configurable. --- ## Sandbox Cloning Clone an existing sandbox. Example: ```text Source: Achmad Sandbox Destination: QA Sandbox ``` Result: ```text Same repositories Same branches Same environment variables Same route overrides ``` New ports are allocated automatically. --- ## Sandbox Snapshots Capture sandbox state. Stored information: * Repository versions * Branches * Environment variables * Route overrides Example: ```text Snapshot: QA-Before-Release ``` Allows rollback and recreation later. --- ## Resource Limits Per sandbox resource controls. Example: ```text CPU: 1 Core Memory: 1 GB ``` Per container: ```text CPU: 500m Memory: 512MB ``` Implemented using Docker resource limits. --- ## Health Monitoring Track sandbox health. Metrics: * Container status * CPU usage * Memory usage * Restart count * Health endpoint status Dashboard should display: ```text Healthy Degraded Unhealthy ``` --- ## Future Infrastructure Agent Node: ```text 172.18.136.93 ``` Responsibilities: * PostgreSQL restore * Database cloning * RabbitMQ management * Redis management * Kafka management Potential use case: ```text Clone QA Database ↓ Attach To Sandbox ↓ Run Integration Testing ``` --- ## Future RBAC Current MVP: ```text All authenticated users ``` Future roles: ```text ADMIN BACKEND QA VIEWER ``` Permissions: ```text Deploy Delete Sandbox Manage Templates Manage Routes Manage Environments ``` --- ## Future Notifications Deployment notifications: ```text Deployment Started Deployment Succeeded Deployment Failed Sandbox Expired ``` Channels: * Email * Slack * Microsoft Teams ``` This section should be appended after the MVP section of the main requirements document. It is intentionally out of scope for the first implementation but provides a roadmap that avoids architectural dead ends later. ```