goldmane-trail: polish follow-ups #57/#59/#61/#62/#63 + digest→#alerts
All checks were successful
ci/woodpecker/push/default Pipeline was successful

Completes the Goldmane who-talks-to-whom trail (ADR-0014), implemented by a
subagent workflow (distinct stacks in parallel, docs last):

- #57 Whisker gated ingress: ingress_factory (whisker.viktorbarzin.me,
  auth=required, Authentik-gated) + a NetworkPolicy allowing traefik->whisker:8081
  (the operator's whisker NP default-denies ingress). calico stack.
- #61 pipeline health: AggregatorDown + DigestFailing Prometheus alerts
  (prometheus_chart_values.tpl) + cluster-health check #48.
- #59 service-identity labels on the multi-Service namespaces (monitoring's 5
  TF-managed deployments + dbaas), with the KYVERNO_LIFECYCLE_V1 marker so they
  update in-place.
- #62/#63 docs: docs/runbooks/goldmane-flow-trail.md (new), service-catalog,
  security.md + monitoring.md east-west sections, ADR-0014 as-built, CONTEXT.md.
  #62 = the SQL to derive the Wave-1 per-namespace egress allowlist from the
  edge table (feeds code-8ywc; enforce-flips out of scope).

Also fixes the digest's Slack target: #security override 404s channel_not_found
because the shared alertmanager_slack_api_url webhook's app isn't a member of
#security (this likely also breaks alertmanager's slack-security receiver — flagged
in the runbook). Routed to #alerts (the webhook's working channel) until the app
is invited; verified a real digest run posts cleanly (360 edges).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-25 17:49:25 +00:00
parent 306cdd4cb3
commit 6c5288998f
17 changed files with 626 additions and 11 deletions

View file

@ -745,7 +745,10 @@ resource "kubernetes_deployment" "phpmyadmin" {
labels = {
"app" = "phpmyadmin"
tier = var.tier
# ADR-0014 service identity: dbaas is a multi-Service namespace, so the
# namespace alone can't attribute Goldmane flows. Value = the fronting
# Service name (kubernetes_service.phpmyadmin is named "pma").
"service-identity" = "pma"
}
annotations = {
"reloader.stakater.com/search" = "true"
@ -762,6 +765,10 @@ resource "kubernetes_deployment" "phpmyadmin" {
metadata {
labels = {
"app" = "phpmyadmin"
# ADR-0014: Goldmane/Felix stamps POD labels onto flows, so the
# disambiguating identity must live on the pod template (not just
# the Deployment metadata above). Not in selector no replace.
"service-identity" = "pma"
}
}
spec {
@ -812,8 +819,19 @@ resource "kubernetes_deployment" "phpmyadmin" {
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
ignore_changes = [
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
# This Deployment is Keel-enrolled (keel.sh/policy=patch). Ignore the
# attributes Keel/Kyverno mutate at runtime so `terragrunt apply` (incl.
# the daily drift plan) doesn't fight them or revert the live image
# canonical KEEL/KYVERNO lifecycle guard, matches linkwarden/chrome-service.
metadata[0].annotations["keel.sh/policy"],
metadata[0].annotations["keel.sh/trigger"],
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
metadata[0].annotations["keel.sh/match-tag"],
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE Keel manages tag updates
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
]
}
}
@ -1499,6 +1517,10 @@ resource "kubernetes_deployment" "pgadmin" {
}
labels = {
tier = var.tier
# ADR-0014 service identity: dbaas is a multi-Service namespace, so the
# namespace alone can't attribute Goldmane flows. Value = the fronting
# Service name (kubernetes_service.pgadmin is named "pgadmin").
"service-identity" = "pgadmin"
}
}
spec {
@ -1514,6 +1536,10 @@ resource "kubernetes_deployment" "pgadmin" {
metadata {
labels = {
app = "pgadmin"
# ADR-0014: Goldmane/Felix stamps POD labels onto flows, so the
# disambiguating identity must live on the pod template (not just
# the Deployment metadata above). Not in selector no replace.
"service-identity" = "pgadmin"
}
}
spec {
@ -1568,8 +1594,20 @@ resource "kubernetes_deployment" "pgadmin" {
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
ignore_changes = [
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
# This Deployment is Keel-enrolled (keel.sh/policy=patch) and Keel has
# bumped the live image (dpage/pgadmin4:9.16). Ignore the Keel/Kyverno
# runtime-mutated attributes so `terragrunt apply` (incl. the daily drift
# plan) doesn't revert the image to bare `dpage/pgadmin4` or strip Keel's
# annotations canonical guard, matches linkwarden/chrome-service.
metadata[0].annotations["keel.sh/policy"],
metadata[0].annotations["keel.sh/trigger"],
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
metadata[0].annotations["keel.sh/match-tag"],
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE Keel manages tag updates
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
]
}
}
resource "kubernetes_service" "pgadmin" {