infra/.planning/phases/01-infrastructure-and-deployment/01-02-PLAN.md
Viktor Barzin fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00

235 lines
8.2 KiB
Markdown

---
phase: 01-infrastructure-and-deployment
plan: 02
type: execute
wave: 2
depends_on:
- "01-01"
files_modified:
- stacks/f1-stream/main.tf
- .woodpecker/f1-stream.yml
autonomous: true
requirements:
- DEPL-01
- DEPL-02
must_haves:
truths:
- "A request to https://f1.viktorbarzin.me/health returns HTTP 200 with JSON {status: ok}"
- "The Terragrunt stack applies cleanly with no errors"
- "A file written to /data inside the pod survives a pod restart"
- "Woodpecker CI pipeline triggers on push for the f1-stream directory"
artifacts:
- path: "stacks/f1-stream/main.tf"
provides: "Kubernetes deployment, service, ingress, TLS for f1-stream"
contains: "viktorbarzin/f1-stream:v2.0.0"
- path: ".woodpecker/f1-stream.yml"
provides: "CI pipeline for f1-stream service"
contains: "f1-stream"
key_links:
- from: "stacks/f1-stream/main.tf"
to: "Docker Hub viktorbarzin/f1-stream:v2.0.0"
via: "kubernetes_deployment image reference"
pattern: "viktorbarzin/f1-stream:v2.0.0"
- from: "stacks/f1-stream/main.tf"
to: "NFS /mnt/main/f1-stream"
via: "inline NFS volume mount"
pattern: "/mnt/main/f1-stream"
- from: "stacks/f1-stream/main.tf"
to: "modules/kubernetes/ingress_factory"
via: "ingress module call"
pattern: "ingress_factory"
---
<objective>
Update the Terraform stack to deploy the new Python/FastAPI container, verify NFS mount persistence, and add a Woodpecker CI pipeline. This completes Phase 1 by making the service live on the cluster and reachable at its public URL.
Purpose: The service must be running on the Kubernetes cluster, reachable at f1.viktorbarzin.me, with NFS storage mounted and CI/CD in place -- ready for application development in Phase 2.
Output: Live deployment at f1.viktorbarzin.me, NFS-backed persistent storage, Woodpecker CI pipeline.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-infrastructure-and-deployment/01-01-SUMMARY.md
# Key reference files:
@stacks/f1-stream/main.tf
@stacks/f1-stream/terragrunt.hcl
@.woodpecker/build-cli.yml
@.woodpecker/default.yml
</context>
<tasks>
<task type="auto">
<name>Task 1: Update Terraform deployment for Python/FastAPI and verify NFS mount</name>
<files>stacks/f1-stream/main.tf</files>
<action>
Modify `stacks/f1-stream/main.tf` to update the deployment for the new Python/FastAPI application:
1. **Change the container image** from `viktorbarzin/f1-stream:v1.3.1` to `viktorbarzin/f1-stream:v2.0.0`
2. **Change the container port** from 8080 to 8000 (FastAPI/uvicorn default)
3. **Update the service target_port** from 8080 to 8000
4. **Remove old Go-specific environment variables** that are no longer needed:
- Remove `WEBAUTHN_RPID`
- Remove `WEBAUTHN_ORIGIN`
- Remove `WEBAUTHN_DISPLAY_NAME`
- Remove `HEADLESS_EXTRACT_ENABLED`
- Remove `TURN_URL`
- Remove `TURN_SHARED_SECRET`
- Remove `TURN_INTERNAL_URL`
5. **Remove unused variables** from the top of the file:
- Remove `variable "coturn_turn_secret"` (was for WebRTC/TURN)
- Remove `variable "public_ip"` (was for TURN URL)
- Keep `variable "tls_secret_name"` and `variable "nfs_server"` (still needed)
6. **Keep the NFS volume mount** exactly as-is -- it already follows the inline NFS pattern:
```hcl
volume {
name = "data"
nfs {
server = var.nfs_server
path = "/mnt/main/f1-stream"
}
}
```
The volume_mount at `/data` stays the same.
7. **Update resource limits** for Python:
```hcl
resources {
limits = {
cpu = "500m"
memory = "256Mi"
}
requests = {
cpu = "50m"
memory = "64Mi"
}
}
```
Python/FastAPI with uvicorn needs less CPU than Go+Chromium but similar memory.
8. **Keep everything else unchanged**: namespace, service, tls_secret module, ingress module.
After editing, apply the Terraform stack:
```bash
cd stacks/f1-stream && terragrunt apply --non-interactive
```
Wait for the deployment to roll out:
```bash
kubectl --kubeconfig $(pwd)/config -n f1-stream rollout status deployment/f1-stream --timeout=120s
```
Verify the pod is running:
```bash
kubectl --kubeconfig $(pwd)/config -n f1-stream get pods
```
Verify the health endpoint responds through the public URL:
```bash
curl -s https://f1.viktorbarzin.me/health
```
Verify NFS mount persistence by writing a test file, restarting the pod, and reading it back:
```bash
POD=$(kubectl --kubeconfig $(pwd)/config -n f1-stream get pods -l app=f1-stream -o jsonpath='{.items[0].metadata.name}')
kubectl --kubeconfig $(pwd)/config -n f1-stream exec $POD -- sh -c 'echo "nfs-test-$(date +%s)" > /data/test-file.txt && cat /data/test-file.txt'
kubectl --kubeconfig $(pwd)/config -n f1-stream rollout restart deployment/f1-stream
kubectl --kubeconfig $(pwd)/config -n f1-stream rollout status deployment/f1-stream --timeout=120s
NEW_POD=$(kubectl --kubeconfig $(pwd)/config -n f1-stream get pods -l app=f1-stream -o jsonpath='{.items[0].metadata.name}')
kubectl --kubeconfig $(pwd)/config -n f1-stream exec $NEW_POD -- cat /data/test-file.txt
```
The test file should contain the same content after the pod restart.
</action>
<verify>
1. `terragrunt apply` exits with 0 (no errors)
2. `kubectl get pods -n f1-stream` shows 1/1 Running
3. `curl -s https://f1.viktorbarzin.me/health` returns `{"status":"ok"}`
4. NFS persistence test passes (file survives pod restart)
</verify>
<done>
The f1-stream deployment is running on the cluster with the new Python/FastAPI image, reachable at https://f1.viktorbarzin.me/health, and the NFS volume at /data persists data across pod restarts.
</done>
</task>
<task type="auto">
<name>Task 2: Create Woodpecker CI pipeline for f1-stream</name>
<files>.woodpecker/f1-stream.yml</files>
<action>
Create `.woodpecker/f1-stream.yml` following the pattern from `build-cli.yml`:
```yaml
when:
event: push
path:
include:
- "stacks/f1-stream/files/**"
clone:
git:
image: woodpeckerci/plugin-git
settings:
attempts: 5
backoff: 10s
steps:
- name: build-image
image: woodpeckerci/plugin-docker-buildx
settings:
username: "viktorbarzin"
password:
from_secret: dockerhub-pat
repo: viktorbarzin/f1-stream
dockerfile: stacks/f1-stream/files/Dockerfile
context: stacks/f1-stream/files
auto_tag: true
```
Key differences from the default pipeline:
- **Path filter**: Only triggers when files under `stacks/f1-stream/files/` change (the application code)
- **Builds and pushes the Docker image** using the same `woodpeckerci/plugin-docker-buildx` pattern as build-cli.yml
- **Docker context** points to the `stacks/f1-stream/files/` directory where the Dockerfile lives
- Does NOT run Terragrunt apply (that is done manually or by the default pipeline for the platform stack)
</action>
<verify>
Verify the YAML is valid: `python3 -c "import yaml; yaml.safe_load(open('.woodpecker/f1-stream.yml')); print('YAML OK')"`
Verify the file exists and references f1-stream correctly.
</verify>
<done>
Woodpecker CI pipeline file exists at `.woodpecker/f1-stream.yml`, configured to build and push the Docker image when files under `stacks/f1-stream/files/` change.
</done>
</task>
</tasks>
<verification>
1. `curl -s https://f1.viktorbarzin.me/health` returns `{"status":"ok"}`
2. `cd stacks/f1-stream && terragrunt plan --non-interactive` shows no changes (stack is clean)
3. NFS test file written before pod restart is readable after pod restart
4. `.woodpecker/f1-stream.yml` exists and is valid YAML
5. `kubectl --kubeconfig $(pwd)/config -n f1-stream get pods` shows 1/1 Running
</verification>
<success_criteria>
- The service is live at https://f1.viktorbarzin.me and responds with 200 on /health
- Terragrunt stack applies cleanly with no manual cluster intervention
- NFS volume mount at /data persists data across pod restarts
- Woodpecker CI pipeline exists for automated image builds
</success_criteria>
<output>
After completion, create `.planning/phases/01-infrastructure-and-deployment/01-02-SUMMARY.md`
</output>