|
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
Builds on the compat gate (prev commit) to finish "auto-upgrade when safe, halt + alert when not": - monitoring: K8sUpgradeBlocked alert (k8s_upgrade_blocked==1, for 10m, warning) in the Upgrade Gates group — the clean "a k8s auto-upgrade was refused, see Slack for why" signal. (Until monitoring is applied, a block still surfaces via the already-live K8sUpgradeChainJobFailed.) - upgrade-step.sh phase_postflight: deeper post-upgrade smoke tests — apiserver /readyz + /livez, in-cluster DNS (resolve kubernetes.default), and core kube-system pods (apiserver/controller-manager/scheduler/etcd/coredns) Running. Any failure halts + alerts (exit 1; no rollback — kubeadm can't downgrade). Catches a "pods look Running but cluster is broken" upgrade. - runbook: documents the compat gate, the blocked alert, how to clear a block, matrix maintenance, and the detector minor-probe fix. After deploy, the nightly chain detects 1.35 (minor detection now works) and correctly BLOCKS on Calico 3.26 / ESO 0.12 / kyverno 1.16 (all behind), alerting via K8sUpgradeBlocked — the autonomy working as designed until the catch-up clears those addons. |
||
|---|---|---|
| .. | ||
| dashboards | ||
| server-power-cycle | ||
| alert_digest.py | ||
| alert_digest.tf | ||
| alloy.yaml | ||
| authentik_walloff_probe.tf | ||
| Dockerfile | ||
| goflow2.tf | ||
| grafana.tf | ||
| grafana_chart_values.yaml | ||
| idrac.tf | ||
| k8s-monitoring-values.yaml | ||
| loki.tf | ||
| loki.yaml | ||
| loki_ingress.tf | ||
| main.tf | ||
| prometheus.tf | ||
| prometheus_chart_values.tpl | ||
| prometheus_snmp_chart_values.yaml | ||
| pve_exporter.tf | ||
| snmp_exporter.tf | ||
| ups_snmp_values.yaml | ||