fix(observability): 🐛 disable false-positive control-plane alerts and fix empty cluster_environment label

Hub defaultRules groups kubernetesSystemControllerManager, kubeScheduler, and
kubernetesSystemScheduler used wrong key 'enabled: false' — chart expects 'create: false'.
This caused KubeControllerManagerDown/KubeSchedulerDown to fire as false positives
because OTC CCE managed k8s does not expose control plane for scraping.

Dev local vmagent had empty externalLabels, so backup-alert rules evaluated by local
vmalert had no cluster_environment label on kube_job_status_failed metrics. Added
cluster_environment=dev to match what the vm-client-stack vmagent adds for hub shipping.
This commit is contained in:
Daniel Sy 2026-06-19 12:42:04 +02:00
parent 32e998df5b
commit 0316eefa43
Signed by untrusted user: danielsy
GPG key ID: 1F39A8BBCD2EE3D3
2 changed files with 5 additions and 4 deletions

View file

@ -708,7 +708,8 @@ vmagent:
port: "8429" port: "8429"
selectAllByDefault: true selectAllByDefault: true
scrapeInterval: 20s scrapeInterval: 20s
externalLabels: {} externalLabels:
cluster_environment: "dev"
# For multi-cluster setups it is useful to use "cluster" label to identify the metrics source. # For multi-cluster setups it is useful to use "cluster" label to identify the metrics source.
# For example: # For example:
# cluster: cluster-name # cluster: cluster-name

View file

@ -201,13 +201,13 @@ defaultRules:
enabled: true enabled: true
rules: {} rules: {}
kubernetesSystemControllerManager: kubernetesSystemControllerManager:
enabled: false create: false
rules: {} rules: {}
kubeScheduler: kubeScheduler:
enabled: false create: false
rules: {} rules: {}
kubernetesSystemScheduler: kubernetesSystemScheduler:
enabled: false create: false
rules: {} rules: {}
kubeStateMetrics: kubeStateMetrics:
enabled: true enabled: true