[ci skip] add Traefik resilience hardening implementation plan

This commit is contained in:
Viktor Barzin 2026-03-01 13:53:50 +00:00
parent 454a48c6ac
commit e50cfa1d19

View file

@ -0,0 +1,941 @@
# Traefik Resilience Hardening Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Make Traefik resilient against downstream dependency failures (ForwardAuth services, hung backends) while preventing pod scheduling and disruption issues.
**Architecture:** Deploy nginx resilience proxies in front of fail-closed ForwardAuth services (Poison Fountain, Authentik), add PodDisruptionBudgets, topology spread constraints, response timeouts, retry middleware, and monitoring alerts.
**Tech Stack:** Terraform/Terragrunt, Kubernetes, Nginx, Traefik CRDs, Prometheus
---
### Task 1: Bump Poison Fountain tier from aux to cluster
This is the simplest change and has no dependencies. Bumping the tier ensures Poison Fountain isn't evicted under memory pressure.
**Files:**
- Modify: `stacks/poison-fountain/main.tf:10` (namespace tier label)
- Modify: `stacks/poison-fountain/main.tf:52` (deployment tier label)
**Step 1: Change namespace tier**
In `stacks/poison-fountain/main.tf`, line 10, change:
```hcl
tier = local.tiers.aux
```
to:
```hcl
tier = local.tiers.cluster
```
**Step 2: Change deployment tier label**
In `stacks/poison-fountain/main.tf`, line 52, change:
```hcl
tier = local.tiers.aux
```
to:
```hcl
tier = local.tiers.cluster
```
**Step 3: Verify the plan**
Run:
```bash
cd stacks/poison-fountain && terragrunt plan --non-interactive 2>&1 | tail -30
```
Expected: Plan shows namespace and deployment label changes from `4-aux` to `1-cluster`. No resource destruction.
**Step 4: Apply**
Run:
```bash
cd stacks/poison-fountain && terragrunt apply --non-interactive
```
**Step 5: Verify the new LimitRange and PriorityClass**
Run:
```bash
kubectl --kubeconfig $(pwd)/config describe limitrange tier-defaults -n poison-fountain
kubectl --kubeconfig $(pwd)/config get pods -n poison-fountain -o jsonpath='{.items[*].spec.priorityClassName}'
```
Expected: LimitRange shows `1-cluster` defaults (512Mi default memory, max 4Gi). Priority class is `tier-1-cluster`.
**Step 6: Commit**
```bash
git add stacks/poison-fountain/main.tf
git commit -m "[ci skip] bump poison-fountain tier from aux to cluster (critical path for all ingress)"
```
---
### Task 2: Deploy bot-block resilience proxy (nginx fail-open in front of Poison Fountain)
Deploy an nginx reverse proxy in the `traefik` namespace that proxies to Poison Fountain's `/auth` endpoint and returns 200 (allow) if Poison Fountain is unreachable.
**Files:**
- Modify: `stacks/platform/modules/traefik/main.tf` (add nginx deployment, service, configmap)
- Modify: `stacks/platform/modules/traefik/middleware.tf:287` (update ai-bot-block ForwardAuth address)
**Step 1: Add nginx configmap for bot-block proxy**
Add to end of `stacks/platform/modules/traefik/main.tf` (before the closing of the file):
```hcl
# Resilience proxy for ai-bot-block ForwardAuth
# Returns 200 (allow all) when Poison Fountain is unreachable
resource "kubernetes_config_map" "bot_block_proxy_config" {
metadata {
name = "bot-block-proxy-config"
namespace = kubernetes_namespace.traefik.metadata[0].name
}
data = {
"default.conf" = <<-EOT
upstream poison_fountain {
server poison-fountain.poison-fountain.svc.cluster.local:8080;
}
server {
listen 8080;
location /auth {
proxy_pass http://poison_fountain;
proxy_connect_timeout 3s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
proxy_intercept_errors on;
error_page 502 503 504 =200 /fallback-allow;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location = /fallback-allow {
internal;
return 200 "allowed";
}
location /healthz {
access_log off;
return 200 "ok";
}
}
EOT
}
}
```
**Step 2: Add nginx deployment for bot-block proxy**
Add after the configmap:
```hcl
resource "kubernetes_deployment" "bot_block_proxy" {
metadata {
name = "bot-block-proxy"
namespace = kubernetes_namespace.traefik.metadata[0].name
labels = {
app = "bot-block-proxy"
}
}
spec {
replicas = 2
strategy {
type = "RollingUpdate"
rolling_update {
max_unavailable = 0
max_surge = 1
}
}
selector {
match_labels = {
app = "bot-block-proxy"
}
}
template {
metadata {
labels = {
app = "bot-block-proxy"
}
}
spec {
topology_spread_constraint {
max_skew = 1
topology_key = "kubernetes.io/hostname"
when_unsatisfiable = "DoNotSchedule"
label_selector {
match_labels = {
app = "bot-block-proxy"
}
}
}
container {
name = "nginx"
image = "nginx:1-alpine"
port {
container_port = 8080
}
volume_mount {
name = "config"
mount_path = "/etc/nginx/conf.d"
read_only = true
}
liveness_probe {
http_get {
path = "/healthz"
port = 8080
}
initial_delay_seconds = 3
period_seconds = 10
}
readiness_probe {
http_get {
path = "/healthz"
port = 8080
}
initial_delay_seconds = 2
period_seconds = 5
}
resources {
requests = {
cpu = "5m"
memory = "16Mi"
}
limits = {
cpu = "50m"
memory = "32Mi"
}
}
}
volume {
name = "config"
config_map {
name = kubernetes_config_map.bot_block_proxy_config.metadata[0].name
}
}
}
}
}
}
resource "kubernetes_service" "bot_block_proxy" {
metadata {
name = "bot-block-proxy"
namespace = kubernetes_namespace.traefik.metadata[0].name
labels = {
app = "bot-block-proxy"
}
}
spec {
selector = {
app = "bot-block-proxy"
}
port {
name = "http"
port = 8080
target_port = 8080
}
}
}
```
**Step 3: Update ai-bot-block ForwardAuth address**
In `stacks/platform/modules/traefik/middleware.tf`, line 287, change:
```hcl
address = "http://poison-fountain.poison-fountain.svc.cluster.local:8080/auth"
```
to:
```hcl
address = "http://bot-block-proxy.traefik.svc.cluster.local:8080/auth"
```
**Step 4: Plan and verify**
Run:
```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | grep -E "will be created|will be updated|Plan:"
```
Expected: 3 resources created (configmap, deployment, service), 1 resource updated (ai-bot-block middleware).
**Step 5: Apply**
Run:
```bash
cd stacks/platform && terragrunt apply --non-interactive
```
**Step 6: Verify the proxy is running and forwarding correctly**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get pods -n traefik -l app=bot-block-proxy
kubectl --kubeconfig $(pwd)/config exec -n traefik deploy/bot-block-proxy -- wget -qO- http://localhost:8080/healthz
```
Expected: 2 pods Running. Health check returns "ok".
**Step 7: Test fail-open behavior**
Temporarily scale Poison Fountain to 0, verify the proxy returns 200:
```bash
kubectl --kubeconfig $(pwd)/config scale deployment poison-fountain -n poison-fountain --replicas=0
kubectl --kubeconfig $(pwd)/config exec -n traefik deploy/bot-block-proxy -- wget -qO- --timeout=10 http://localhost:8080/auth 2>&1
kubectl --kubeconfig $(pwd)/config scale deployment poison-fountain -n poison-fountain --replicas=2
```
Expected: With Poison Fountain at 0 replicas, the proxy returns 200 (fallback). After scaling back, normal forwarding resumes.
**Step 8: Commit**
```bash
git add stacks/platform/modules/traefik/main.tf stacks/platform/modules/traefik/middleware.tf
git commit -m "[ci skip] add bot-block resilience proxy: fail-open when Poison Fountain is down"
```
---
### Task 3: Deploy auth resilience proxy (nginx basicAuth fallback in front of Authentik)
Deploy an nginx proxy that forwards to Authentik's outpost and falls back to basicAuth when Authentik is unreachable.
**Files:**
- Modify: `stacks/platform/modules/traefik/main.tf` (add nginx deployment, service, configmap, htpasswd secret)
- Modify: `stacks/platform/modules/traefik/middleware.tf:36` (update authentik ForwardAuth address)
- Modify: `stacks/platform/modules/traefik/main.tf:1` (add variable for htpasswd)
**Step 1: Add htpasswd variable**
Add to top of `stacks/platform/modules/traefik/main.tf` (after existing variables):
```hcl
variable "auth_fallback_htpasswd" {
type = string
description = "htpasswd-format string for emergency basicAuth fallback when Authentik is down"
sensitive = true
}
```
**Step 2: Generate htpasswd and add to terraform.tfvars**
Run (to generate a bcrypt htpasswd entry):
```bash
htpasswd -nbB admin "$(openssl rand -base64 16)"
```
Add the output to `terraform.tfvars`:
```hcl
auth_fallback_htpasswd = "admin:$2y$05$..." # Generated value
```
**Step 3: Pass variable through platform module**
In `stacks/platform/main.tf`, find the traefik module block and add:
```hcl
auth_fallback_htpasswd = var.auth_fallback_htpasswd
```
Add to `stacks/platform/main.tf` variables (if not already present):
```hcl
variable "auth_fallback_htpasswd" {
type = string
sensitive = true
default = ""
}
```
**Step 4: Add nginx configmap, secret, deployment, and service for auth proxy**
Add to end of `stacks/platform/modules/traefik/main.tf`:
```hcl
# Resilience proxy for Authentik ForwardAuth
# Falls back to basicAuth when Authentik is unreachable
resource "kubernetes_secret" "auth_proxy_htpasswd" {
metadata {
name = "auth-proxy-htpasswd"
namespace = kubernetes_namespace.traefik.metadata[0].name
}
data = {
"htpasswd" = var.auth_fallback_htpasswd
}
}
resource "kubernetes_config_map" "auth_proxy_config" {
metadata {
name = "auth-proxy-config"
namespace = kubernetes_namespace.traefik.metadata[0].name
}
data = {
"default.conf" = <<-EOT
upstream authentik {
server ak-outpost-authentik-embedded-outpost.authentik.svc.cluster.local:9000;
}
server {
listen 9000;
# Main auth endpoint - proxy to Authentik, fallback to basicAuth
location /outpost.goauthentik.io/auth/traefik {
proxy_pass http://authentik;
proxy_connect_timeout 3s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
proxy_intercept_errors on;
error_page 502 503 504 = @fallback_auth;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;
}
location @fallback_auth {
auth_basic "Emergency Access";
auth_basic_user_file /etc/nginx/htpasswd;
add_header X-authentik-username $remote_user always;
add_header X-Auth-Fallback "true" always;
return 200;
}
# Pass through other outpost paths (for OAuth flows when Authentik IS up)
location /outpost.goauthentik.io/ {
proxy_pass http://authentik;
proxy_connect_timeout 3s;
proxy_read_timeout 10s;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /healthz {
access_log off;
return 200 "ok";
}
}
EOT
}
}
resource "kubernetes_deployment" "auth_proxy" {
metadata {
name = "auth-proxy"
namespace = kubernetes_namespace.traefik.metadata[0].name
labels = {
app = "auth-proxy"
}
}
spec {
replicas = 2
strategy {
type = "RollingUpdate"
rolling_update {
max_unavailable = 0
max_surge = 1
}
}
selector {
match_labels = {
app = "auth-proxy"
}
}
template {
metadata {
labels = {
app = "auth-proxy"
}
}
spec {
topology_spread_constraint {
max_skew = 1
topology_key = "kubernetes.io/hostname"
when_unsatisfiable = "DoNotSchedule"
label_selector {
match_labels = {
app = "auth-proxy"
}
}
}
container {
name = "nginx"
image = "nginx:1-alpine"
port {
container_port = 9000
}
volume_mount {
name = "config"
mount_path = "/etc/nginx/conf.d"
read_only = true
}
volume_mount {
name = "htpasswd"
mount_path = "/etc/nginx/htpasswd"
sub_path = "htpasswd"
read_only = true
}
liveness_probe {
http_get {
path = "/healthz"
port = 9000
}
initial_delay_seconds = 3
period_seconds = 10
}
readiness_probe {
http_get {
path = "/healthz"
port = 9000
}
initial_delay_seconds = 2
period_seconds = 5
}
resources {
requests = {
cpu = "5m"
memory = "16Mi"
}
limits = {
cpu = "50m"
memory = "32Mi"
}
}
}
volume {
name = "config"
config_map {
name = kubernetes_config_map.auth_proxy_config.metadata[0].name
}
}
volume {
name = "htpasswd"
secret {
secret_name = kubernetes_secret.auth_proxy_htpasswd.metadata[0].name
}
}
}
}
}
}
resource "kubernetes_service" "auth_proxy" {
metadata {
name = "auth-proxy"
namespace = kubernetes_namespace.traefik.metadata[0].name
labels = {
app = "auth-proxy"
}
}
spec {
selector = {
app = "auth-proxy"
}
port {
name = "http"
port = 9000
target_port = 9000
}
}
}
```
**Step 5: Update authentik ForwardAuth address**
In `stacks/platform/modules/traefik/middleware.tf`, line 36, change:
```hcl
address = "http://ak-outpost-authentik-embedded-outpost.authentik.svc.cluster.local:9000/outpost.goauthentik.io/auth/traefik"
```
to:
```hcl
address = "http://auth-proxy.traefik.svc.cluster.local:9000/outpost.goauthentik.io/auth/traefik"
```
**Step 6: Plan and verify**
Run:
```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | grep -E "will be created|will be updated|Plan:"
```
Expected: 4 resources created (secret, configmap, deployment, service), 1 resource updated (authentik-forward-auth middleware).
**Step 7: Apply**
Run:
```bash
cd stacks/platform && terragrunt apply --non-interactive
```
**Step 8: Verify proxy is running**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get pods -n traefik -l app=auth-proxy
kubectl --kubeconfig $(pwd)/config exec -n traefik deploy/auth-proxy -- wget -qO- http://localhost:9000/healthz
```
Expected: 2 pods Running. Health check returns "ok".
**Step 9: Commit**
```bash
git add stacks/platform/modules/traefik/main.tf stacks/platform/modules/traefik/middleware.tf stacks/platform/main.tf
git commit -m "[ci skip] add auth resilience proxy: basicAuth fallback when Authentik is down"
```
Note: Do NOT commit terraform.tfvars (it contains the htpasswd secret and is git-crypt encrypted — it will be included in the next push automatically).
---
### Task 4: Add Traefik topology spread, PDB, and response timeout
**Files:**
- Modify: `stacks/platform/modules/traefik/main.tf:26-205` (Helm values)
**Step 1: Add topology spread constraints to Traefik Helm values**
In `stacks/platform/modules/traefik/main.tf`, after the `tolerations = []` line (line 204), add:
```hcl
topologySpreadConstraints = [{
maxSkew = 1
topologyKey = "kubernetes.io/hostname"
whenUnsatisfiable = "DoNotSchedule"
labelSelector = {
matchLabels = {
"app.kubernetes.io/name" = "traefik"
}
}
}]
podDisruptionBudget = {
enabled = true
minAvailable = 2
}
```
**Step 2: Change response header timeout**
In `stacks/platform/modules/traefik/main.tf`, line 184, change:
```hcl
"--serversTransport.forwardingTimeouts.responseHeaderTimeout=0s",
```
to:
```hcl
"--serversTransport.forwardingTimeouts.responseHeaderTimeout=30s",
```
**Step 3: Plan and verify**
Run:
```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | grep -E "will be|Plan:"
```
Expected: Helm release will be updated in-place.
**Step 4: Apply**
Run:
```bash
cd stacks/platform && terragrunt apply --non-interactive
```
**Step 5: Verify topology spread**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get pods -n traefik -l app.kubernetes.io/name=traefik -o wide
```
Expected: 3 pods on 3 different nodes.
**Step 6: Verify PDB**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get pdb -n traefik
```
Expected: PDB with minAvailable=2, currentHealthy=3, allowedDisruptions=1.
**Step 7: Commit**
```bash
git add stacks/platform/modules/traefik/main.tf
git commit -m "[ci skip] add Traefik topology spread, PDB (minAvailable=2), and 30s response timeout"
```
---
### Task 5: Add Authentik PDB
**Files:**
- Modify: `stacks/platform/modules/authentik/values.yaml`
**Step 1: Add PDB configuration to Authentik Helm values**
In `stacks/platform/modules/authentik/values.yaml`, add after the `server:` section (after line 33, before `global:`):
```yaml
pdb:
enabled: true
minAvailable: 2
```
So the server section becomes:
```yaml
server:
replicas: 3
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: "2"
memory: 1Gi
ingress:
enabled: false
podAnnotations:
diun.enable: true
diun.include_tags: "^202[0-9].[0-9]+.*$"
pdb:
enabled: true
minAvailable: 2
```
**Step 2: Plan and verify**
Run:
```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | grep -E "will be|Plan:"
```
Expected: Helm release will be updated.
**Step 3: Apply**
Run:
```bash
cd stacks/platform && terragrunt apply --non-interactive
```
**Step 4: Verify PDB**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get pdb -n authentik
```
Expected: PDB with minAvailable=2, currentHealthy=3, allowedDisruptions=1.
**Step 5: Commit**
```bash
git add stacks/platform/modules/authentik/values.yaml
git commit -m "[ci skip] add Authentik PDB (minAvailable=2)"
```
---
### Task 6: Add retry middleware to ingress factory
**Files:**
- Modify: `stacks/platform/modules/traefik/middleware.tf` (add retry middleware)
- Modify: `modules/kubernetes/ingress_factory/main.tf:112-113` (add to default chain)
**Step 1: Add retry middleware CRD**
Add to end of `stacks/platform/modules/traefik/middleware.tf`:
```hcl
# Retry middleware for transient backend failures (502/503 during restarts)
resource "kubernetes_manifest" "middleware_retry" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "Middleware"
metadata = {
name = "retry"
namespace = kubernetes_namespace.traefik.metadata[0].name
}
spec = {
retry = {
attempts = 2
initialInterval = "100ms"
}
}
}
depends_on = [helm_release.traefik]
}
```
**Step 2: Add retry middleware to ingress factory default chain**
In `modules/kubernetes/ingress_factory/main.tf`, line 112, the middleware chain starts with rate-limit. Add retry as the first middleware (retries should wrap the entire chain):
Change line 112-113 from:
```hcl
"traefik.ingress.kubernetes.io/router.middlewares" = join(",", compact(concat([
var.skip_default_rate_limit ? null : "traefik-rate-limit@kubernetescrd",
```
to:
```hcl
"traefik.ingress.kubernetes.io/router.middlewares" = join(",", compact(concat([
"traefik-retry@kubernetescrd",
var.skip_default_rate_limit ? null : "traefik-rate-limit@kubernetescrd",
```
**Step 3: Plan both stacks**
Run:
```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | grep -E "will be|Plan:"
```
Expected: 1 resource created (retry middleware).
Note: The ingress_factory change will take effect the next time any service stack is applied (it's a module used by all stacks). The middleware CRD must exist first.
**Step 4: Apply platform stack**
Run:
```bash
cd stacks/platform && terragrunt apply --non-interactive
```
**Step 5: Verify retry middleware exists**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get middleware -n traefik retry
```
Expected: Middleware `retry` exists.
**Step 6: Commit**
```bash
git add stacks/platform/modules/traefik/middleware.tf modules/kubernetes/ingress_factory/main.tf
git commit -m "[ci skip] add retry middleware (2 attempts, 100ms) to default ingress chain"
```
---
### Task 7: Add Prometheus alerts and inhibition rules
**Files:**
- Modify: `stacks/platform/modules/monitoring/prometheus_chart_values.tpl`
**Step 1: Add PoisonFountainDown alert**
In `stacks/platform/modules/monitoring/prometheus_chart_values.tpl`, in the "Critical Services" alert group (after the AuthentikDown alert, around line 435), add:
```yaml
- alert: PoisonFountainDown
expr: (kube_deployment_status_replicas_available{namespace="poison-fountain", deployment="poison-fountain"} or on() vector(0)) < 1
for: 2m
labels:
severity: critical
annotations:
summary: "Poison Fountain is down - AI bot blocking degraded to fail-open"
```
**Step 2: Add ForwardAuthFallbackActive alert**
In the "Traefik Ingress" alert group (after the TraefikHighOpenConnections alert, around line 587), add:
```yaml
- alert: ForwardAuthFallbackActive
expr: |
(kube_deployment_status_replicas_available{namespace="poison-fountain", deployment="poison-fountain"} or on() vector(0)) < 1
or (kube_deployment_status_replicas_available{namespace="authentik", deployment="goauthentik-server"} or on() vector(0)) < 1
for: 5m
labels:
severity: warning
annotations:
summary: "ForwardAuth resilience proxy is serving fallback responses - check Poison Fountain and Authentik"
```
**Step 3: Add alert inhibition rule**
In the `inhibit_rules` section (around line 63), add after the existing TraefikDown inhibition:
```yaml
# Traefik down makes Poison Fountain alerts redundant
- source_matchers:
- alertname = TraefikDown
target_matchers:
- alertname =~ "PoisonFountainDown|ForwardAuthFallbackActive"
```
**Step 4: Plan and verify**
Run:
```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | grep -E "will be|Plan:"
```
Expected: Helm release updated (Prometheus config changes).
**Step 5: Apply**
Run:
```bash
cd stacks/platform && terragrunt apply --non-interactive
```
**Step 6: Verify alerts are loaded**
Run:
```bash
kubectl --kubeconfig $(pwd)/config exec -n monitoring deploy/prometheus-server -- wget -qO- http://localhost:9090/api/v1/rules 2>&1 | python3 -c "import sys,json; rules=[r['name'] for g in json.load(sys.stdin)['data']['groups'] for r in g['rules']]; print('PoisonFountainDown:', 'PoisonFountainDown' in rules); print('ForwardAuthFallbackActive:', 'ForwardAuthFallbackActive' in rules)"
```
Expected: Both alerts show `True`.
**Step 7: Commit**
```bash
git add stacks/platform/modules/monitoring/prometheus_chart_values.tpl
git commit -m "[ci skip] add PoisonFountainDown and ForwardAuthFallbackActive alerts with inhibition"
```
---
### Task 8: Final verification and push
**Step 1: Run cluster health check**
Run:
```bash
bash scripts/cluster_healthcheck.sh --quiet
```
Expected: No new WARN/FAIL related to our changes.
**Step 2: Verify all resilience proxies are running**
Run:
```bash
kubectl --kubeconfig $(pwd)/config get pods -n traefik -l "app in (bot-block-proxy,auth-proxy)" -o wide
kubectl --kubeconfig $(pwd)/config get pods -n traefik -l app.kubernetes.io/name=traefik -o wide
kubectl --kubeconfig $(pwd)/config get pdb -A
```
Expected: All proxy pods running on different nodes, Traefik pods spread across nodes, PDBs for Traefik and Authentik.
**Step 3: Test a public service is still accessible**
Run:
```bash
curl -s -o /dev/null -w "%{http_code}" https://viktorbarzin.me
```
Expected: 200 (or 301/302 redirect). Not 502.
**Step 4: Push all commits**
Ask user for confirmation, then:
```bash
git push origin master
```