diff --git a/.claude/skills/helm-release-force-rerender/SKILL.md b/.claude/skills/helm-release-force-rerender/SKILL.md new file mode 100644 index 00000000..d0648c15 --- /dev/null +++ b/.claude/skills/helm-release-force-rerender/SKILL.md @@ -0,0 +1,166 @@ +--- +name: helm-release-force-rerender +description: | + Fix for Helm releases managed by Terraform where changing Helm values doesn't update + the actual Kubernetes resources. Use when: (1) Terraform applies successfully but + K8s resources (Service, Deployment) don't reflect new Helm values, + (2) New ports/volumes/containers from Helm chart values don't appear in the deployed resources, + (3) helm upgrade --reuse-values doesn't re-render templates for structural changes, + (4) Terraform thinks Helm release is up-to-date but actual K8s resources are stale. + Solution involves removing from Terraform state, reimporting, and force upgrading. +author: Claude Code +version: 1.0.0 +date: 2026-02-07 +--- + +# Helm Release Force Re-render via Terraform + +## Problem +After changing Helm chart values in a Terraform `helm_release` resource, Terraform applies +successfully but the actual Kubernetes resources (Services, Deployments, etc.) don't reflect +the new values. For example, adding a new port in Helm values doesn't result in that port +appearing in the Service spec. + +## Context / Trigger Conditions +- Terraform `helm_release` applies with "1 changed" but `kubectl get svc -o yaml` shows + the old configuration +- Structural changes to Helm values (new ports, new containers, new volumes) are not + reflected in deployed resources +- The Helm chart templates need to be fully re-rendered, not just patched +- Common with Traefik, ingress-nginx, and other charts where template logic conditionally + includes resources based on values + +## Root Cause +Terraform's `helm_release` resource uses `helm upgrade` under the hood. When values are +changed, Helm may use `--reuse-values` behavior where it merges new values into existing +ones rather than doing a full template re-render. For structural changes (like enabling +HTTP/3 which adds a new UDP port to the Service template), the templates may not be +re-rendered with the new conditional branches active. + +Additionally, Terraform may see the stored Helm release state as matching the desired state +even though the actual Kubernetes resources don't reflect it, creating a state drift that +Terraform doesn't detect. + +## Solution + +### Step 1: Verify the Discrepancy + +Confirm that K8s resources don't match Helm values: +```bash +# Check the actual resource +kubectl get svc -n -o yaml + +# Check what Helm thinks is deployed +helm get values -n +helm get manifest -n | grep -A10 "" +``` + +### Step 2: Remove Helm Release from Terraform State + +```bash +terraform state rm 'module.kubernetes_cluster.module..helm_release.' +``` + +**IMPORTANT**: This only removes from Terraform state. The actual Helm release and K8s +resources remain untouched in the cluster. + +### Step 3: Import the Helm Release Back + +```bash +terraform import 'module.kubernetes_cluster.module..helm_release.' '/' +``` + +For Helm releases, the import ID format is `namespace/release-name`. + +### Step 4: Force Apply with Terraform + +After reimporting, run terraform apply. Terraform should now detect the drift between +the desired Helm values and the actual release state: + +```bash +terraform apply -target=module.kubernetes_cluster.module. +``` + +If Terraform still shows "no changes", you may need to taint the resource: +```bash +terraform taint 'module.kubernetes_cluster.module..helm_release.' +terraform apply -target=module.kubernetes_cluster.module. +``` + +### Step 5: Manual Helm Force Upgrade (Last Resort) + +If Terraform still doesn't fix it, use Helm directly as a one-time fix, then reimport: + +```bash +# Get the current values file +helm get values -n -o yaml > /tmp/values.yaml + +# Edit /tmp/values.yaml to include the correct values, or use --set flags + +# Force upgrade (re-renders all templates) +helm upgrade --force -n -f /tmp/values.yaml + +# Then reimport into Terraform +terraform state rm 'module.kubernetes_cluster.module..helm_release.' +terraform import 'module.kubernetes_cluster.module..helm_release.' '/' +terraform apply -target=module.kubernetes_cluster.module. +``` + +**WARNING**: Direct Helm operations bypass Terraform. Always reimport into Terraform state +afterward, and use `terraform apply` to verify Terraform is back in sync. + +## Verification + +```bash +# Check the K8s resources now match expected configuration +kubectl get svc -n -o yaml +kubectl get deployment -n -o yaml + +# Verify Terraform is in sync +terraform plan -target=module.kubernetes_cluster.module. +# Should show "No changes" or minimal expected drift +``` + +## Example: Traefik HTTP/3 UDP Port Not Appearing + +**Problem**: Added `http3.enabled=true` to Traefik Helm values. Terraform applied +successfully, but the Traefik Service only had TCP port 443, missing the expected +UDP port 443 (`websecure-http3`). + +**Fix**: +```bash +# 1. Remove from state +terraform state rm 'module.kubernetes_cluster.module.traefik.helm_release.traefik' + +# 2. Reimport +terraform import 'module.kubernetes_cluster.module.traefik.helm_release.traefik' 'traefik/traefik' + +# 3. Apply (Terraform now detects the drift) +terraform apply -target=module.kubernetes_cluster.module.traefik + +# 4. Verify +kubectl get svc traefik -n traefik -o yaml | grep -A3 "websecure-http3" +# Should show: port: 443, protocol: UDP +``` + +## Notes + +- This issue is more common with structural Helm value changes (new ports, new sidecars, + conditional template blocks) than with simple value changes (image tags, replica counts) +- The `helm upgrade --force` flag deletes and recreates resources that have changed, + which causes brief downtime. Use with caution on production ingress controllers. +- Always verify with `terraform plan` after fixing to ensure Terraform state is consistent +- This is different from the `terraform-state-identity-mismatch` skill, which covers + provider-level identity errors. This skill covers Helm template rendering issues where + the state looks correct but the actual resources don't match. + +## See Also + +- `terraform-state-identity-mismatch` - For Terraform provider identity errors +- `traefik-http3-quic` - For enabling HTTP/3 on Traefik (common trigger for this issue) + +## References + +- [Terraform helm_release Resource](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) +- [Helm Upgrade Documentation](https://helm.sh/docs/helm/helm_upgrade/) +- [Helm --force Flag](https://helm.sh/docs/helm/helm_upgrade/#options) diff --git a/.claude/skills/traefik-http3-quic/SKILL.md b/.claude/skills/traefik-http3-quic/SKILL.md new file mode 100644 index 00000000..c4bf33d5 --- /dev/null +++ b/.claude/skills/traefik-http3-quic/SKILL.md @@ -0,0 +1,179 @@ +--- +name: traefik-http3-quic +description: | + Enable HTTP/3 (QUIC) on a Traefik ingress controller in Kubernetes, managed via + Terraform Helm charts. Use when: (1) you want to enable HTTP/3 on Traefik, + (2) Alt-Svc header shows wrong port (e.g., 8443 instead of 443), + (3) HTTP/3 is configured in Helm values but not working end-to-end, + (4) Cloudflare-proxied domains need HTTP/3 enabled. + Covers Traefik Helm chart values, advertisedPort gotcha, Cloudflare zone settings, + and end-to-end verification. +author: Claude Code +version: 1.0.0 +date: 2026-02-07 +--- + +# Traefik HTTP/3 (QUIC) Enablement + +## Problem +You want to enable HTTP/3 (QUIC) on a Traefik ingress controller in Kubernetes so that +clients can negotiate HTTP/3 connections via the `Alt-Svc` response header. + +## Context / When to Use +- Enabling HTTP/3 for the first time on Traefik +- Troubleshooting HTTP/3 not working despite configuration +- Alt-Svc header shows internal container port (8443) instead of external port (443) +- Need to enable HTTP/3 on both origin (Traefik) and CDN (Cloudflare) + +## Solution + +### Step 1: Configure Traefik Helm Chart Values + +In the Traefik Helm release values, add `http3` configuration to the `websecure` entrypoint: + +```hcl +# In modules/kubernetes/traefik/main.tf +ports = { + websecure = { + port = 8443 + exposedPort = 443 + protocol = "TCP" + http = { + tls = { + enabled = true + } + } + # Enable HTTP/3 (QUIC) + http3 = { + enabled = true + advertisedPort = 443 # CRITICAL: Must match the external port + } + } +} +``` + +**Key gotcha: `advertisedPort = 443`** + +Without `advertisedPort`, Traefik advertises the *internal container port* (8443) in the +`Alt-Svc` header: +``` +Alt-Svc: h3=":8443"; ma=2592000 +``` + +This is wrong because clients connect on external port 443, not 8443. The correct header is: +``` +Alt-Svc: h3=":443"; ma=2592000 +``` + +Setting `advertisedPort = 443` fixes this. + +### Step 2: Ensure Helm Chart Fully Re-renders + +Changing `http3.enabled=true` in values alone may not cause the Helm chart to add the +required UDP port to the Service and Deployment specs. The Traefik Helm chart templates +need to re-render to include `websecure-http3: 443/UDP` in the Service. + +If the Service doesn't show a UDP port after applying: +- See the companion skill `helm-release-force-rerender` for fixing this +- The root cause is that `helm upgrade --reuse-values` (Terraform's default behavior) + may not trigger template re-rendering for structural changes like adding new ports + +After a successful apply, verify the Service has the UDP port: +```bash +kubectl get svc traefik -n traefik -o yaml | grep -A5 "443" +``` + +Expected output should include both: +```yaml +- name: websecure + port: 443 + protocol: TCP + targetPort: websecure +- name: websecure-http3 + port: 443 + protocol: UDP + targetPort: websecure-http3 +``` + +### Step 3: Enable HTTP/3 on Cloudflare (if using Cloudflare proxy) + +For Cloudflare-proxied domains, HTTP/3 must also be enabled at the Cloudflare zone level. + +**Cloudflare Provider v4** (current in this repo): +```hcl +resource "cloudflare_zone_settings_override" "http3" { + zone_id = var.cloudflare_zone_id + + settings { + http3 = "on" # String values: "on" or "off" + } +} +``` + +**Note**: In Cloudflare provider v5, this uses `cloudflare_zone_setting` (singular) with +different syntax. The v4 resource is `cloudflare_zone_settings_override` (plural + override). + +### Step 4: Verify End-to-End + +#### Testing from macOS + +macOS system curl does NOT support HTTP/3. Install curl with HTTP/3: +```bash +brew install curl +``` + +Then use the Homebrew version explicitly: +```bash +# Test HTTP/3 negotiation (Alt-Svc header) +/opt/homebrew/opt/curl/bin/curl -sI https://example.viktorbarzin.me 2>&1 | grep -i alt-svc +# Expected: alt-svc: h3=":443"; ma=2592000 + +# Test actual HTTP/3 connection +/opt/homebrew/opt/curl/bin/curl --http3-only -sI https://example.viktorbarzin.me +# Expected: HTTP/3 200 +``` + +#### Testing from within the Cluster + +```bash +# Use a curl image with HTTP/3 support (amd64 only) +kubectl run curl-h3 --rm -it --image=ymuski/curl-http3 --restart=Never -- \ + curl --http3-only -sI https://example.viktorbarzin.me + +# Note: ymuski/curl-http3 is amd64-only; it will fail on arm64 nodes +``` + +#### Checking Traefik Logs + +```bash +kubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=100 | grep -i quic +``` + +## Verification Checklist + +1. Traefik Service shows UDP port 443 (`websecure-http3`) +2. `Alt-Svc` response header shows `h3=":443"` (not `h3=":8443"`) +3. `/opt/homebrew/opt/curl/bin/curl --http3-only` successfully connects +4. Cloudflare zone has HTTP/3 enabled (for proxied domains) + +## Current Configuration (This Repo) + +- **Traefik config**: `modules/kubernetes/traefik/main.tf` (lines 89-92) +- **Cloudflare HTTP/3**: `modules/kubernetes/cloudflared/cloudflare.tf` (line 153) +- **MetalLB IP**: 10.0.20.202 (Traefik LoadBalancer service) + +## Notes + +- HTTP/3 uses QUIC over UDP. Firewalls must allow UDP 443 inbound. +- Traefik automatically handles TLS for HTTP/3 using the same certs as HTTPS. +- The `Alt-Svc` header is sent on HTTP/2 responses to tell clients HTTP/3 is available. + Clients then upgrade to HTTP/3 on subsequent requests. +- For non-Cloudflare (direct DNS) domains, only the Traefik-side config is needed. +- Cloudflare handles its own HTTP/3 negotiation with end users; the origin connection + between Cloudflare and Traefik uses HTTP/1.1 or HTTP/2 (not HTTP/3). + +## References + +- [Traefik HTTP/3 Documentation](https://doc.traefik.io/traefik/routing/entrypoints/#http3) +- [Traefik Helm Chart Values](https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml) +- [Cloudflare HTTP/3 Settings](https://developers.cloudflare.com/speed/optimization/protocol/http3/)