4591ee7b14
feat(observability): 🗂️ organize dashboards into Grafana folders
...
Assigns folder field to all GrafanaDashboard CRs:
- EDP / Overview: platform-overview
- EDP / Applications: forgejo, argocd-operational, garm, argocd
- EDP / Operations: cronjob-monitoring, ingress-nginx, victoria-logs
2026-06-19 14:46:41 +02:00
b6fbd3f6eb
feat(observability): ✨ add VictoriaLogs log panels to platform, forgejo, argocd dashboards
2026-06-19 13:34:12 +02:00
076b2a16c9
fix(observability): 🐛 fix datasource UIDs, replace cronjob dashboard, add GARM
...
- Remove all ${DS_VICTORIAMETRICS} uid refs from platform-overview; use
type-only datasource so grafana-operator resolves default prometheus DS
- Replace grafanaCom id:14279 cronjob dashboard with inline custom version
supporting cluster_environment variable (dev/edp/observability)
- Add new GARM runners dashboard (edp-garm) ready for when GARM metrics
are scraped; uses or vector(0) guards so panels show 0 not empty
Note: cluster_environment values confirmed as dev/edp/observability (no benchmark).
GARM metrics not yet present in VictoriaMetrics (0 series found).
2026-06-19 13:11:42 +02:00
6ea1e798d2
fix(observability): 🐛 add missing manifests to instance stacks
...
- backup-alerts.yaml → observability.buildth.ing victoria-k8s-stack
- forgejo-scrape.yaml → dev.t09.de vm-client-stack
2026-06-19 13:06:24 +02:00
91db8038e6
feat(observability): ✨ custom ArgoCD dashboard with cluster_environment filter
2026-06-19 13:02:48 +02:00
949529eb5c
feat(observability): ✨ add cluster_environment dropdown to Forgejo and platform-overview dashboards
...
- Replace grafanaCom import (17802) with custom inline Forgejo dashboard
containing cluster_environment query variable (refresh=2, label=Environment)
- Add label, refresh=2, sort=1 to platform-overview cluster_environment variable
- ArgoCD (19993) and CronJob (14279) remain grafanaCom imports (acceptable)
2026-06-19 12:50:32 +02:00
c2528f6f69
feat(observability): ✨ add platform grafana dashboard CRs
...
- Add forgejo.yaml: Forgejo app dashboard (grafana.com ID 17802)
- Add argocd-operational.yaml: ArgoCD operational dashboard (grafana.com ID 19993)
- Add cronjob-monitoring.yaml: CronJob/backup monitoring dashboard (grafana.com ID 14279)
- Add platform-overview.yaml: custom EDP Platform Overview inline dashboard
(platform health, forgejo stats, resource usage, backup status rows)
- Fix victoria-logs.yaml: replace broken URL with grafanaCom ID 22698
2026-06-19 12:47:44 +02:00
0316eefa43
fix(observability): 🐛 disable false-positive control-plane alerts and fix empty cluster_environment label
...
Hub defaultRules groups kubernetesSystemControllerManager, kubeScheduler, and
kubernetesSystemScheduler used wrong key 'enabled: false' — chart expects 'create: false'.
This caused KubeControllerManagerDown/KubeSchedulerDown to fire as false positives
because OTC CCE managed k8s does not expose control plane for scraping.
Dev local vmagent had empty externalLabels, so backup-alert rules evaluated by local
vmalert had no cluster_environment label on kube_job_status_failed metrics. Added
cluster_environment=dev to match what the vm-client-stack vmagent adds for hub shipping.
2026-06-19 12:42:21 +02:00
Martin McCaffery
63cdb926b9
fix(sustainability-rules): remove Kepler energy rules since Kepler is incompatible
2026-06-02 16:12:22 +01:00
Martin McCaffery
f98f53a5a0
revert(kepler): remove Kepler, incompatible with OTC CCE proc mount restrictions
2026-06-02 16:12:06 +01:00
Martin McCaffery
b5594a8017
feat(observability): add sustainability metrics, Kepler, 6-month retention, GARM scrape
2026-06-02 15:51:26 +01:00
Martin McCaffery
b98486f445
fix: argocd metrics port name, coredns metrics via headless service
2026-06-02 12:13:38 +01:00
Martin McCaffery
07261b081e
upgrade victoria-metrics-k8s-stack 0.48.1 -> 0.81.0 with values migration
2026-06-02 09:51:49 +01:00
Martin McCaffery
da0ccbd1b5
fix(observability): enable ArgoCD/CoreDNS scraping, add cluster label, fix node dashboard
2026-06-01 16:47:31 +01:00
Martin McCaffery
e89d48c2a5
Upgrade Grafana to 12.4.0 and add auth.jwt config for useKubeAuth
2026-06-01 13:16:37 +01:00
Martin McCaffery
3b31475552
Fix grafana-operator chart version tag (no v prefix)
2026-06-01 13:02:49 +01:00
Martin McCaffery
1686764b39
Upgrade grafana-operator to v5.23.0 and enable useKubeAuth
2026-06-01 12:58:14 +01:00
Automated pipeline
464a9eb22e
Automated upload for observability.buildth.ing
2026-03-04 09:55:46 +00:00
Manuel Ganter
c824cd32ed
disabled scrape for kubescheduler
2025-10-22 16:20:27 +02:00
richardrobertreitz
218a1cbff8
chore(alerts): disabled bogus alerts related to kubecontrollermanager and kubescheduler
2025-10-21 08:48:57 +00:00
1696a6f42c
Update otc/observability.buildth.ing/stacks/observability/victoria-k8s-stack/values.yaml
2025-10-14 11:41:30 +00:00
Automated pipeline
2820a37e00
Automated upload for observability.buildth.ing
2025-08-12 12:40:19 +00:00
Automated pipeline
bf54e7fe38
Automated upload for observability.buildth.ing
2025-08-12 08:31:20 +00:00
Automated pipeline
625f2e0005
Initial upload
2025-07-21 12:52:28 +00:00
Automated pipeline
fe696adecc
Initial upload
2025-07-21 08:08:22 +00:00
Automated pipeline
fdeb8363b6
Initial upload
2025-06-30 08:02:54 +00:00