infra/.claude/skills/archived/traefik-helm-configuration/SKILL.md
Viktor Barzin fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00

14 KiB

name description author version date
traefik-helm-configuration Consolidated Traefik Helm chart configuration skill covering HTTP/3 (QUIC), UDP cross-namespace routing, and plugin download failures. Use when: (1) enabling HTTP/3 on Traefik or Alt-Svc header shows wrong port (e.g., 8443 instead of 443), (2) HTTP/3 is configured in Helm values but not working end-to-end, (3) Cloudflare-proxied domains need HTTP/3 enabled, (4) custom UDP entrypoints don't appear in the LoadBalancer Service, (5) IngressRouteUDP logs show "udp service is not in the parent resource namespace", (6) DNS or other UDP traffic through Traefik times out despite correct IngressRouteUDP config, (7) all Traefik routes suddenly return 404 after a restart or pod recreation, (8) Traefik logs show "Plugins are disabled because an error has occurred", (9) plugin download fails with "context deadline exceeded" for crowdsec-bouncer or rewrite-body. Claude Code 1.0.0 2026-02-22

Traefik Helm Chart Configuration

Consolidated guide for three common Traefik Helm chart issues: HTTP/3 (QUIC) enablement, UDP cross-namespace routing, and plugin download failures causing global 404s.


HTTP/3 (QUIC)

Problem

You want to enable HTTP/3 (QUIC) on a Traefik ingress controller in Kubernetes so that clients can negotiate HTTP/3 connections via the Alt-Svc response header.

Context / When to Use

  • Enabling HTTP/3 for the first time on Traefik
  • Troubleshooting HTTP/3 not working despite configuration
  • Alt-Svc header shows internal container port (8443) instead of external port (443)
  • Need to enable HTTP/3 on both origin (Traefik) and CDN (Cloudflare)

Solution

Step 1: Configure Traefik Helm Chart Values

In the Traefik Helm release values, add http3 configuration to the websecure entrypoint:

# In modules/kubernetes/traefik/main.tf
ports = {
  websecure = {
    port        = 8443
    exposedPort = 443
    protocol    = "TCP"
    http = {
      tls = {
        enabled = true
      }
    }
    # Enable HTTP/3 (QUIC)
    http3 = {
      enabled        = true
      advertisedPort = 443  # CRITICAL: Must match the external port
    }
  }
}

Key gotcha: advertisedPort = 443

Without advertisedPort, Traefik advertises the internal container port (8443) in the Alt-Svc header:

Alt-Svc: h3=":8443"; ma=2592000

This is wrong because clients connect on external port 443, not 8443. The correct header is:

Alt-Svc: h3=":443"; ma=2592000

Setting advertisedPort = 443 fixes this.

Step 2: Ensure Helm Chart Fully Re-renders

Changing http3.enabled=true in values alone may not cause the Helm chart to add the required UDP port to the Service and Deployment specs. The Traefik Helm chart templates need to re-render to include websecure-http3: 443/UDP in the Service.

If the Service doesn't show a UDP port after applying:

  • See the companion skill helm-release-force-rerender for fixing this
  • The root cause is that helm upgrade --reuse-values (Terraform's default behavior) may not trigger template re-rendering for structural changes like adding new ports

After a successful apply, verify the Service has the UDP port:

kubectl get svc traefik -n traefik -o yaml | grep -A5 "443"

Expected output should include both:

- name: websecure
  port: 443
  protocol: TCP
  targetPort: websecure
- name: websecure-http3
  port: 443
  protocol: UDP
  targetPort: websecure-http3

Step 3: Enable HTTP/3 on Cloudflare (if using Cloudflare proxy)

For Cloudflare-proxied domains, HTTP/3 must also be enabled at the Cloudflare zone level.

Cloudflare Provider v4 (current in this repo):

resource "cloudflare_zone_settings_override" "http3" {
  zone_id = var.cloudflare_zone_id

  settings {
    http3 = "on"  # String values: "on" or "off"
  }
}

Note: In Cloudflare provider v5, this uses cloudflare_zone_setting (singular) with different syntax. The v4 resource is cloudflare_zone_settings_override (plural + override).

Step 4: Verify End-to-End

Testing from macOS

macOS system curl does NOT support HTTP/3. Install curl with HTTP/3:

brew install curl

Then use the Homebrew version explicitly:

# Test HTTP/3 negotiation (Alt-Svc header)
/opt/homebrew/opt/curl/bin/curl -sI https://example.viktorbarzin.me 2>&1 | grep -i alt-svc
# Expected: alt-svc: h3=":443"; ma=2592000

# Test actual HTTP/3 connection
/opt/homebrew/opt/curl/bin/curl --http3-only -sI https://example.viktorbarzin.me
# Expected: HTTP/3 200
Testing from within the Cluster
# Use a curl image with HTTP/3 support (amd64 only)
kubectl run curl-h3 --rm -it --image=ymuski/curl-http3 --restart=Never -- \
  curl --http3-only -sI https://example.viktorbarzin.me

# Note: ymuski/curl-http3 is amd64-only; it will fail on arm64 nodes
Checking Traefik Logs
kubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=100 | grep -i quic

Verification Checklist

  1. Traefik Service shows UDP port 443 (websecure-http3)
  2. Alt-Svc response header shows h3=":443" (not h3=":8443")
  3. /opt/homebrew/opt/curl/bin/curl --http3-only successfully connects
  4. Cloudflare zone has HTTP/3 enabled (for proxied domains)

Current Configuration (This Repo)

  • Traefik config: modules/kubernetes/traefik/main.tf (lines 89-92)
  • Cloudflare HTTP/3: modules/kubernetes/cloudflared/cloudflare.tf (line 153)
  • MetalLB IP: 10.0.20.202 (Traefik LoadBalancer service)

Notes

  • HTTP/3 uses QUIC over UDP. Firewalls must allow UDP 443 inbound.
  • Traefik automatically handles TLS for HTTP/3 using the same certs as HTTPS.
  • The Alt-Svc header is sent on HTTP/2 responses to tell clients HTTP/3 is available. Clients then upgrade to HTTP/3 on subsequent requests.
  • For non-Cloudflare (direct DNS) domains, only the Traefik-side config is needed.
  • Cloudflare handles its own HTTP/3 negotiation with end users; the origin connection between Cloudflare and Traefik uses HTTP/1.1 or HTTP/2 (not HTTP/3).

UDP Cross-Namespace Routing

Problem

Adding a custom UDP entrypoint (e.g., DNS on port 53) to Traefik v3 via Helm chart values doesn't work out of the box. Traffic times out even though the Traefik pod listens on the port internally. Two separate issues compound:

  1. The Helm chart defaults expose to false for custom entrypoints -- the port is never added to the LoadBalancer Service
  2. allowCrossNamespace defaults to false -- IngressRouteUDP in namespace A can't reference a Service in namespace B

Context / Trigger Conditions

  • Traefik Helm chart v39.0.0+ (Traefik v3.x)
  • Custom UDP entrypoint defined in ports values
  • IngressRouteUDP referencing a service in a different namespace
  • Symptoms:
    • kubectl get svc traefik doesn't show your custom UDP port
    • UDP traffic to the LoadBalancer IP times out
    • Traefik logs show: "udp service <namespace>/<service> is not in the parent resource namespace <traefik-namespace>"
    • netstat -ulnp inside Traefik pod confirms it IS listening on the port

Solution

Fix 1: Expose the UDP port on the Service

In the Helm values, add expose = { default = true } to the entrypoint:

# Terraform HCL
ports = {
  dns-udp = {
    port        = 5353
    exposedPort = 53
    protocol    = "UDP"
    expose      = { default = true }  # <-- Required for custom entrypoints
  }
}
# Helm values YAML equivalent
ports:
  dns-udp:
    port: 5353
    exposedPort: 53
    protocol: UDP
    expose:
      default: true

Note: The built-in web and websecure entrypoints have expose.default = true by default, but custom entrypoints do NOT.

Fix 2: Enable cross-namespace CRD references

In the Helm values, add allowCrossNamespace = true to the kubernetesCRD provider:

# Terraform HCL
providers = {
  kubernetesCRD = {
    enabled              = true
    allowCrossNamespace  = true  # <-- Required for cross-namespace IngressRouteUDP
  }
}
# Helm values YAML
providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: true

This is required whenever an IngressRouteUDP (or IngressRouteTCP, IngressRoute) references a Kubernetes Service in a different namespace.

Verification

# 1. Verify the port appears in the Service
kubectl get svc -n traefik traefik -o jsonpath='{.spec.ports[*].name}'
# Should include your custom entrypoint name (e.g., "dns-udp")

# 2. Check Traefik logs for cross-namespace errors
kubectl logs -n traefik -l app.kubernetes.io/name=traefik | grep "not in the parent resource namespace"
# Should return nothing after the fix

# 3. Test the UDP service
dig @<traefik-lb-ip> example.com

Example

DNS forwarding through Traefik to Technitium DNS:

  • IngressRouteUDP in traefik namespace routes dns-udp entrypoint to technitium-dns:53 in technitium namespace
  • Without Fix 1: port 53 never exposed on LoadBalancer -- traffic can't reach Traefik
  • Without Fix 2: Traefik rejects the route -- logs error every ~60 seconds
  • With both fixes: DNS queries to LoadBalancer IP:53 -> Traefik -> Technitium

Notes

  1. Debugging order matters: Fix 1 (expose) must come first. Without the port on the Service, you can't even test if the routing works. Fix 2 (cross-namespace) errors only appear in Traefik logs, not as user-visible failures.
  2. allowCrossNamespace is a security consideration: It allows any IngressRoute CRD to reference services in any namespace. If this is too broad, consider using TraefikService middleware or moving the IngressRouteUDP to the target namespace.
  3. Rolling update: Changing allowCrossNamespace triggers a Traefik pod restart (new CLI args). Changing expose only updates the Service (no pod restart needed).
  4. This applies to TCP too: IngressRouteTCP with cross-namespace services needs the same allowCrossNamespace setting.

Plugin Download Failure (Global 404)

Problem

After a node maintenance operation (containerd restart, node drain/uncordon, etc.), all Traefik-managed routes return 404. Services, Ingresses, and Middlewares all exist and look correct, making this extremely confusing to debug.

Context / Trigger Conditions

  • ALL Traefik routes return 404 simultaneously (not just one service)
  • Traefik pods are Running and Ready
  • Ingress resources exist with correct annotations
  • Middlewares exist in the correct namespaces
  • TLS secrets exist
  • Traefik startup logs contain: Plugins are disabled because an error has occurred
  • Plugin download error: unable to download plugin ... context deadline exceeded
  • Happened after a node restart, containerd restart, or network disruption

Root Cause

Traefik downloads plugins (crowdsec-bouncer, rewrite-body, etc.) from plugins.traefik.io on every pod startup. If the download fails (network unreachable, DNS not ready, timeout), Traefik disables ALL plugins entirely.

Since the crowdsec middleware is a plugin-based middleware referenced in virtually every Ingress annotation (traefik-crowdsec@kubernetescrd), Traefik treats the missing plugin middleware as a fatal routing error and returns 404 for every route that references it -- which is typically all of them.

Solution

# 1. Confirm the diagnosis - check Traefik startup logs
kubectl logs -n traefik -l app.kubernetes.io/name=traefik | head -20
# Look for: "Plugins are disabled because an error has occurred"

# 2. Verify outbound connectivity is restored
kubectl exec -n traefik $(kubectl get pods -n traefik -l app.kubernetes.io/name=traefik \
  -o jsonpath='{.items[0].metadata.name}') -- wget -q -O- --timeout=5 https://plugins.traefik.io

# 3. Rollout restart to retry plugin download
kubectl rollout restart deployment -n traefik traefik

# 4. Verify plugins loaded
kubectl logs -n traefik -l app.kubernetes.io/name=traefik | grep "Plugins"
# Should show: "Plugins loaded."

# 5. Verify routes work
curl -s -o /dev/null -w "%{http_code}" -H "Host: viktorbarzin.me" https://10.0.20.202 -k
# Should return 200 instead of 404

Verification

  • Traefik logs show Plugins loaded. (not Plugins are disabled)
  • Routes return expected HTTP status codes (200, 302, etc.) instead of 404
  • kubectl logs -n traefik <pod> | grep "does not exist" shows no middleware errors

Why This Is Hard to Debug

  1. Traefik pods show Running/Ready -- health checks pass even without plugins
  2. All Kubernetes resources look correct -- Ingresses, Services, Middlewares all exist
  3. The error is in startup logs only -- not in per-request logs (requests just get 404)
  4. The 404 is Traefik's default -- same as "no route matched", not a backend error
  5. The middleware error is logged once at startup -- easy to miss in a stream of logs

Prevention

  • During planned maintenance (node drain, containerd restart), restart Traefik pods AFTER network connectivity is confirmed restored
  • Consider pre-caching Traefik plugins in the container image or using an init container
  • Monitor for the Plugins are disabled log message in your alerting system

Notes

  • This affects ALL plugin-based middlewares, not just crowdsec
  • The rewrite-body plugin (used for rybbit analytics injection) is also affected
  • Traefik v3.x downloads plugins on every startup; there is no persistent cache
  • If only some routes return 404, the problem is likely different (missing middleware or TLS secret, not a plugin issue)

References

See Also

  • traefik-rewrite-body-troubleshooting -- Traefik rewrite-body plugin troubleshooting (compression, Accept header issues)
  • helm-release-force-rerender -- Force Helm chart re-render when structural changes don't take effect