Commit graph

21 commits

Author SHA1 Message Date
949529eb5c
feat(observability): add cluster_environment dropdown to Forgejo and platform-overview dashboards
- Replace grafanaCom import (17802) with custom inline Forgejo dashboard
  containing cluster_environment query variable (refresh=2, label=Environment)
- Add label, refresh=2, sort=1 to platform-overview cluster_environment variable
- ArgoCD (19993) and CronJob (14279) remain grafanaCom imports (acceptable)
2026-06-19 12:50:32 +02:00
c2528f6f69
feat(observability): add platform grafana dashboard CRs
- Add forgejo.yaml: Forgejo app dashboard (grafana.com ID 17802)
- Add argocd-operational.yaml: ArgoCD operational dashboard (grafana.com ID 19993)
- Add cronjob-monitoring.yaml: CronJob/backup monitoring dashboard (grafana.com ID 14279)
- Add platform-overview.yaml: custom EDP Platform Overview inline dashboard
  (platform health, forgejo stats, resource usage, backup status rows)
- Fix victoria-logs.yaml: replace broken URL with grafanaCom ID 22698
2026-06-19 12:47:44 +02:00
0316eefa43
fix(observability): 🐛 disable false-positive control-plane alerts and fix empty cluster_environment label
Hub defaultRules groups kubernetesSystemControllerManager, kubeScheduler, and
kubernetesSystemScheduler used wrong key 'enabled: false' — chart expects 'create: false'.
This caused KubeControllerManagerDown/KubeSchedulerDown to fire as false positives
because OTC CCE managed k8s does not expose control plane for scraping.

Dev local vmagent had empty externalLabels, so backup-alert rules evaluated by local
vmalert had no cluster_environment label on kube_job_status_failed metrics. Added
cluster_environment=dev to match what the vm-client-stack vmagent adds for hub shipping.
2026-06-19 12:42:21 +02:00
Martin McCaffery
63cdb926b9
fix(sustainability-rules): remove Kepler energy rules since Kepler is incompatible 2026-06-02 16:12:22 +01:00
Martin McCaffery
f98f53a5a0
revert(kepler): remove Kepler, incompatible with OTC CCE proc mount restrictions 2026-06-02 16:12:06 +01:00
Martin McCaffery
b5594a8017
feat(observability): add sustainability metrics, Kepler, 6-month retention, GARM scrape 2026-06-02 15:51:26 +01:00
Martin McCaffery
b98486f445
fix: argocd metrics port name, coredns metrics via headless service 2026-06-02 12:13:38 +01:00
Martin McCaffery
07261b081e
upgrade victoria-metrics-k8s-stack 0.48.1 -> 0.81.0 with values migration 2026-06-02 09:51:49 +01:00
Martin McCaffery
da0ccbd1b5
fix(observability): enable ArgoCD/CoreDNS scraping, add cluster label, fix node dashboard 2026-06-01 16:47:31 +01:00
Martin McCaffery
e89d48c2a5
Upgrade Grafana to 12.4.0 and add auth.jwt config for useKubeAuth 2026-06-01 13:16:37 +01:00
Martin McCaffery
3b31475552
Fix grafana-operator chart version tag (no v prefix) 2026-06-01 13:02:49 +01:00
Martin McCaffery
1686764b39
Upgrade grafana-operator to v5.23.0 and enable useKubeAuth 2026-06-01 12:58:14 +01:00
Automated pipeline
464a9eb22e Automated upload for observability.buildth.ing 2026-03-04 09:55:46 +00:00
Manuel Ganter
c824cd32ed
disabled scrape for kubescheduler 2025-10-22 16:20:27 +02:00
richardrobertreitz
218a1cbff8 chore(alerts): disabled bogus alerts related to kubecontrollermanager and kubescheduler 2025-10-21 08:48:57 +00:00
1696a6f42c Update otc/observability.buildth.ing/stacks/observability/victoria-k8s-stack/values.yaml 2025-10-14 11:41:30 +00:00
Automated pipeline
2820a37e00 Automated upload for observability.buildth.ing 2025-08-12 12:40:19 +00:00
Automated pipeline
bf54e7fe38 Automated upload for observability.buildth.ing 2025-08-12 08:31:20 +00:00
Automated pipeline
625f2e0005 Initial upload 2025-07-21 12:52:28 +00:00
Automated pipeline
fe696adecc Initial upload 2025-07-21 08:08:22 +00:00
Automated pipeline
fdeb8363b6 Initial upload 2025-06-30 08:02:54 +00:00