Commit graph

19 commits

Author SHA1 Message Date
b1a00d0395
fix(observability): 🐛 add missing simple-user-secret to hub observability stack
The hub's VMUser (vmauth.yaml) references simple-user-secret via passwordRef,
but the Secret was never added to the hub's manifests. Without this Secret,
the VM operator cannot reconcile the VMUser into the vmauth config, causing
ALL requests to fall through to the unauthorizedUser catch-all (vmsingle).

Result: Vector log shipping to VictoriaLogs was broken — vmauth routed
/insert/elasticsearch/_bulk to vmsingle instead of vlogs-victorialogs.
2026-06-19 15:28:14 +02:00
6ea1e798d2
fix(observability): 🐛 add missing manifests to instance stacks
- backup-alerts.yaml → observability.buildth.ing victoria-k8s-stack
- forgejo-scrape.yaml → dev.t09.de vm-client-stack
2026-06-19 13:06:24 +02:00
91db8038e6
feat(observability): custom ArgoCD dashboard with cluster_environment filter 2026-06-19 13:02:48 +02:00
0316eefa43
fix(observability): 🐛 disable false-positive control-plane alerts and fix empty cluster_environment label
Hub defaultRules groups kubernetesSystemControllerManager, kubeScheduler, and
kubernetesSystemScheduler used wrong key 'enabled: false' — chart expects 'create: false'.
This caused KubeControllerManagerDown/KubeSchedulerDown to fire as false positives
because OTC CCE managed k8s does not expose control plane for scraping.

Dev local vmagent had empty externalLabels, so backup-alert rules evaluated by local
vmalert had no cluster_environment label on kube_job_status_failed metrics. Added
cluster_environment=dev to match what the vm-client-stack vmagent adds for hub shipping.
2026-06-19 12:42:21 +02:00
Martin McCaffery
63cdb926b9
fix(sustainability-rules): remove Kepler energy rules since Kepler is incompatible 2026-06-02 16:12:22 +01:00
Martin McCaffery
f98f53a5a0
revert(kepler): remove Kepler, incompatible with OTC CCE proc mount restrictions 2026-06-02 16:12:06 +01:00
Martin McCaffery
b5594a8017
feat(observability): add sustainability metrics, Kepler, 6-month retention, GARM scrape 2026-06-02 15:51:26 +01:00
Martin McCaffery
b98486f445
fix: argocd metrics port name, coredns metrics via headless service 2026-06-02 12:13:38 +01:00
Martin McCaffery
07261b081e
upgrade victoria-metrics-k8s-stack 0.48.1 -> 0.81.0 with values migration 2026-06-02 09:51:49 +01:00
Martin McCaffery
da0ccbd1b5
fix(observability): enable ArgoCD/CoreDNS scraping, add cluster label, fix node dashboard 2026-06-01 16:47:31 +01:00
Automated pipeline
464a9eb22e Automated upload for observability.buildth.ing 2026-03-04 09:55:46 +00:00
Manuel Ganter
c824cd32ed
disabled scrape for kubescheduler 2025-10-22 16:20:27 +02:00
richardrobertreitz
218a1cbff8 chore(alerts): disabled bogus alerts related to kubecontrollermanager and kubescheduler 2025-10-21 08:48:57 +00:00
1696a6f42c Update otc/observability.buildth.ing/stacks/observability/victoria-k8s-stack/values.yaml 2025-10-14 11:41:30 +00:00
Automated pipeline
2820a37e00 Automated upload for observability.buildth.ing 2025-08-12 12:40:19 +00:00
Automated pipeline
bf54e7fe38 Automated upload for observability.buildth.ing 2025-08-12 08:31:20 +00:00
Automated pipeline
625f2e0005 Initial upload 2025-07-21 12:52:28 +00:00
Automated pipeline
fe696adecc Initial upload 2025-07-21 08:08:22 +00:00
Automated pipeline
fdeb8363b6 Initial upload 2025-06-30 08:02:54 +00:00