Commit graph

  • 7a6f96a8b4
    feat(observability): add cluster heartbeat dead-man switch alerts main Daniel Sy 2026-06-22 11:05:43 +02:00
  • eda2812d47
    fix(observability): 🔇 silence managed-K8s false alerts + bump backup deadline to 4h Daniel Sy 2026-06-22 10:45:49 +02:00
  • 3ed3487e97
    fix(observability): 🐛 harden vmagent liveness probe failureThreshold 10→3 Daniel Sy 2026-06-22 10:40:43 +02:00
  • 01c41c9379
    fix(observability): 🐛 use cluster_environment as global clusterLabel for default dashboards Daniel Sy 2026-06-22 10:34:58 +02:00
  • 3141b7bd6c
    feat(observability): comprehensive platform alert rules Daniel Sy 2026-06-19 16:43:21 +02:00
  • 70939149ea
    feat(observability): add read routes to vmauth for dev.t09.de instance Daniel Sy 2026-06-19 16:37:33 +02:00
  • 23edd5d6b4
    feat(observability): add read routes to vmauth for metrics and logs queries Daniel Sy 2026-06-19 16:33:04 +02:00
  • 0a249820de
    fix(observability): 🐛 fix ArgoCD scrape port name http-metrics not metrics Daniel Sy 2026-06-19 16:11:09 +02:00
  • f3931dc550
    fix(observability): 🐛 add ArgoCD + GARM VMServiceScrapes to dev client stack Daniel Sy 2026-06-19 16:07:06 +02:00
  • 8488de0c6f
    fix(observability): 🐛 use plaintext password in hub VMUser to unblock operator reconciliation Daniel Sy 2026-06-19 15:45:48 +02:00
  • b1a00d0395
    fix(observability): 🐛 add missing simple-user-secret to hub observability stack Daniel Sy 2026-06-19 15:28:03 +02:00
  • 4591ee7b14
    feat(observability): 🗂️ organize dashboards into Grafana folders Daniel Sy 2026-06-19 14:46:35 +02:00
  • 7f5c680e19
    fix(observability): 🐛 enable GARM unauthenticated metrics + ArgoCD metrics on all instances Daniel Sy 2026-06-19 13:36:15 +02:00
  • b6fbd3f6eb
    feat(observability): add VictoriaLogs log panels to platform, forgejo, argocd dashboards Daniel Sy 2026-06-19 13:34:08 +02:00
  • bcf583a055
    fix(observability): 🐛 fix Vector log shipping URL on all clusters Daniel Sy 2026-06-19 13:32:13 +02:00
  • 238ef71630
    fix(observability): 🐛 fix remote write URL and add manifests for benchmark + edp clients Daniel Sy 2026-06-19 13:23:37 +02:00
  • 076b2a16c9
    fix(observability): 🐛 fix datasource UIDs, replace cronjob dashboard, add GARM Daniel Sy 2026-06-19 13:11:32 +02:00
  • 6ea1e798d2
    fix(observability): 🐛 add missing manifests to instance stacks Daniel Sy 2026-06-19 13:06:19 +02:00
  • 91db8038e6
    feat(observability): custom ArgoCD dashboard with cluster_environment filter Daniel Sy 2026-06-19 13:02:35 +02:00
  • 949529eb5c
    feat(observability): add cluster_environment dropdown to Forgejo and platform-overview dashboards Daniel Sy 2026-06-19 12:50:20 +02:00
  • c2528f6f69
    feat(observability): add platform grafana dashboard CRs Daniel Sy 2026-06-19 12:47:34 +02:00
  • 0316eefa43
    fix(observability): 🐛 disable false-positive control-plane alerts and fix empty cluster_environment label Daniel Sy 2026-06-19 12:42:04 +02:00
  • 32e998df5b
    fix(forgejo): ⏱️ increase s3-backup activeDeadlineSeconds 1350→7200 Daniel Sy 2026-06-19 12:35:18 +02:00
  • 59eed97263
    fix(observability-client): 🐛 fix remote write URL and add missing manifests dir Daniel Sy 2026-06-19 11:41:20 +02:00
  • 369961a940
    fix(observability): 🐛 enable vmagent, fix grafana auth, disable vmauth on dev Daniel Sy 2026-06-19 10:44:25 +02:00
  • d83945413d
    fix(observability): 🐛 change VLSingle → VLogs in victorialogs manifest Daniel Sy 2026-06-19 10:20:13 +02:00
  • ef4a1d7ce2
    fix(observability): 🐛 disable crds.cleanup hook in victoria-metrics-operator Daniel Sy 2026-06-19 09:58:50 +02:00
  • 29c0a59734
    fix(observability): 🐛 add SkipDryRunOnMissingResource to o12y syncOptions Daniel Sy 2026-06-19 09:56:19 +02:00
  • a52a6691a8
    fix(observability): 🐛 add prune + RespectIgnoreDifferences to o12y syncPolicy Daniel Sy 2026-06-19 09:51:42 +02:00
  • 9ed3ff50d2
    bump(benchmark): ci-sizer-collector sidecar 0.9.0 → 0.9.7 to pick up host-resolved kernel_peak + cgroup_path_count diagnostic benchmark-cluster-2026-06-19-canonical Martin McCaffery 2026-06-17 11:38:48 +02:00
  • 57ee5afa62
    feat(observability): add VMServiceScrapes + migrate VLogs → VLSingle Daniel Sy 2026-06-15 21:05:11 +02:00
  • 7949cabb29
    fix(garm): ⬆️ update to v0.1.7-forgejo-24 (fresh multi-arch build) Daniel Sy 2026-06-12 13:41:43 +02:00
  • 8939b4f32b
    fix(secrets-backup): 🔄 sync simplified manifest from template Daniel Sy 2026-06-12 13:12:04 +02:00
  • 900c1f6c80
    fix(dev): 🐛 revert automated-upload damage — restore working image pins + OIDC secrets Daniel Sy 2026-06-12 10:10:50 +02:00
  • 95deeef6a0 Automated upload for dev.t09.de Automated pipeline 2026-06-12 07:46:00 +00:00
  • 9bbcf4efca
    fix(secrets-backup): 🐛 add openssl install + upgrade image to 1.32.0 Daniel Sy 2026-06-12 09:32:35 +02:00
  • cf8271fd86
    revert(ci-sizer): 🔥 revert image pin — no versioned images in registry Daniel Sy 2026-06-08 18:12:56 +02:00
  • f4aa470894
    fix(ci-sizer): 📌 pin sizer-receiver to v0.8.1 for dev Daniel Sy 2026-06-08 18:08:04 +02:00
  • 3fdfda9da7
    fix(ci-sizer): 📌 pin sizer-receiver to v0.8.2 for dev Daniel Sy 2026-06-08 18:06:00 +02:00
  • 69839f767b
    fix(ci-sizer): 🐛 set RECEIVER_ALLOWED_ORG=giteaAdmin for dev Daniel Sy 2026-06-08 18:00:47 +02:00
  • 925c7416b3
    fix(ci-sizer): 🐛 revert RECEIVER_ALLOWED_ORG to DevFW for dev env Daniel Sy 2026-06-08 17:51:13 +02:00
  • bd82384eb1
    fix(dex): 🔐 correct sizer client secret to match sizer-oidc-client Daniel Sy 2026-06-08 17:11:02 +02:00
  • 967edf0382
    fix(ci-sizer): 🔐 align OIDC client secret with dex config Daniel Sy 2026-06-08 16:59:47 +02:00
  • 9a7544418c fix(forgejo): 🐛 use workflow-webhook image matching DB migration level (v15a/v15b) Daniel.Sy 2026-06-08 14:11:31 +00:00
  • a047be3aae fix(garm): ⬇️ rollback to v0.1.7-forgejo-22 — -23 has exec format error (wrong arch) Daniel.Sy 2026-06-08 14:11:05 +00:00
  • 422f568c8e Automated upload for dev.t09.de Automated pipeline 2026-06-08 12:15:27 +00:00
  • 011f436fb7
    feat(benchmark.t09.de/garm): bump ci-sizer-collector 0.8.3 → 0.9.0 (kernel-peak + cgroup-v1 limit fallback) Martin McCaffery 2026-06-03 15:01:09 +01:00
  • 14873b7941
    fix(garm): bump dev+benchmark to garm-helm v0.0.17 (template-robust readToken); drop now-redundant explicit fields on benchmark Martin McCaffery 2026-06-02 16:21:51 +01:00
  • 63cdb926b9
    fix(sustainability-rules): remove Kepler energy rules since Kepler is incompatible Martin McCaffery 2026-06-02 16:12:22 +01:00
  • f98f53a5a0
    revert(kepler): remove Kepler, incompatible with OTC CCE proc mount restrictions Martin McCaffery 2026-06-02 16:12:06 +01:00
  • 608439697b
    fix(benchmark.t09.de/garm): pin ci-sizer-collector to 0.8.3 (latest tagged release, avoid :latest drift during long runs) Martin McCaffery 2026-06-02 16:08:35 +01:00
  • b5594a8017
    feat(observability): add sustainability metrics, Kepler, 6-month retention, GARM scrape Martin McCaffery 2026-06-02 15:51:26 +01:00
  • bbdca11f00
    fix(benchmark.t09.de/garm): bump ci-sizer-collector to :latest (0.0.4 tag doesn't exist in registry, was unreachable until sizer integration was restored) Martin McCaffery 2026-06-02 15:42:10 +01:00
  • 3be56f5a07
    fix(vm-client): add nodename-to-IP metricRelabelConfig for node-exporter Martin McCaffery 2026-06-02 14:58:36 +01:00
  • e2469e7843
    fix(benchmark.t09.de/garm): explicit sizer readToken mountPath/key/fileName (chart defaults not deep-merging, was rendering broken %!s(<nil>) path that crashed sizer consultation) Martin McCaffery 2026-06-02 14:38:41 +01:00
  • b98486f445
    fix: argocd metrics port name, coredns metrics via headless service Martin McCaffery 2026-06-02 12:13:38 +01:00
  • eca54cb19c
    fix(vm-client): use in-cluster VMSingle URL for remote write Martin McCaffery 2026-06-02 12:03:44 +01:00
  • 71a8fef501
    fix(vm-client): create missing manifests directory Martin McCaffery 2026-06-02 11:59:42 +01:00
  • e95fa403e9
    fix(benchmark.t09.de/garm): wire sizer baseUrl + readToken so edge-connect-k8s provider actually applies sizer recommendations (was silently no-op) Martin McCaffery 2026-06-02 11:56:11 +01:00
  • d0b0c85cf8
    fix: add ServerSideApply for argocd CRDs, remove deprecated vector playground field Martin McCaffery 2026-06-02 09:57:05 +01:00
  • 07261b081e
    upgrade victoria-metrics-k8s-stack 0.48.1 -> 0.81.0 with values migration Martin McCaffery 2026-06-02 09:51:49 +01:00
  • 07d08e5839
    upgrade chart versions: argocd, dex, cloudnative-pg, cert-manager, ingress-nginx, vector, metrics-server Martin McCaffery 2026-06-02 09:50:04 +01:00
  • 342870fa03
    fix(vm-client): add cluster external label for dashboard variable resolution Martin McCaffery 2026-06-02 09:30:24 +01:00
  • da0ccbd1b5
    fix(observability): enable ArgoCD/CoreDNS scraping, add cluster label, fix node dashboard Martin McCaffery 2026-06-01 16:47:31 +01:00
  • 3212016398
    fix(vector): use in-cluster endpoint for VictoriaLogs log shipping Martin McCaffery 2026-06-01 16:47:24 +01:00
  • e89d48c2a5
    Upgrade Grafana to 12.4.0 and add auth.jwt config for useKubeAuth Martin McCaffery 2026-06-01 13:16:37 +01:00
  • 32fd6ffd54
    Remove useKubeAuth temporarily to unblock operator upgrade Martin McCaffery 2026-06-01 13:06:41 +01:00
  • 3b31475552
    Fix grafana-operator chart version tag (no v prefix) Martin McCaffery 2026-06-01 13:02:49 +01:00
  • 1686764b39
    Upgrade grafana-operator to v5.23.0 and enable useKubeAuth Martin McCaffery 2026-06-01 12:57:53 +01:00
  • a7bc25603c
    Added DevFW-CICD users as admins Patrick Sy 2026-05-19 14:01:18 +02:00
  • c927cbd0dc
    bump garm-helm to v0.0.16 for benchmark Martin McCaffery 2026-05-19 09:54:48 +02:00
  • 732a27d5f1
    fix(benchmark): disable 2FA requirement for benchmark cluster Martin McCaffery 2026-05-18 17:22:57 +02:00
  • 3c8850d2e2 Automated upload for benchmark.t09.de Automated pipeline 2026-05-18 15:20:18 +00:00
  • f12daac048 Automated upload for benchmark.t09.de Automated pipeline 2026-05-18 14:32:18 +00:00
  • 27475f9cf3 Automated upload for benchmark.t09.de Automated pipeline 2026-05-18 14:04:23 +00:00
  • 046679e355 Automated upload for benchmark.t09.de Automated pipeline 2026-05-18 10:29:51 +00:00
  • 7e1b0418f6
    feat(benchmark): add ci-sizer registry app for benchmark.t09.de Daniel Sy 2026-05-18 12:21:27 +02:00
  • f2747ece68 Automated upload for benchmark.t09.de Automated pipeline 2026-05-18 10:02:58 +00:00
  • 75e4a2384b
    fix(ci-sizer): 🐛 align GARM_URL with template output Daniel Sy 2026-05-18 10:25:58 +02:00
  • 8b9fb6bdd8 Automated upload for benchmark.t09.de Automated pipeline 2026-05-13 11:39:29 +00:00
  • 2c14713ae5 feat(benchmark): add ci-sizer registry for benchmark.t09.de [4/4] Daniel.Sy 2026-05-13 10:19:43 +00:00
  • 1a591f1c37 feat(benchmark): add ci-sizer ingress for benchmark.t09.de [3/4] Daniel.Sy 2026-05-13 10:19:36 +00:00
  • 6977dac98d feat(benchmark): add ci-sizer deployment for benchmark.t09.de [2/4] Daniel.Sy 2026-05-13 10:19:29 +00:00
  • b84476f71e feat(benchmark): add ci-sizer stacks-instances for benchmark.t09.de [1/4] Daniel.Sy 2026-05-13 10:19:17 +00:00
  • d4b54c854f
    fix: increased pvc size due to out of disk space error Patrick Sy 2026-05-11 10:56:01 +02:00
  • bc086d5c31
    fix: increase smol backup disk Patrick Sy 2026-05-07 17:40:36 +02:00
  • 5be2bf1409
    fix: increased body size by 10x for large image layer uploads Patrick Sy 2026-05-05 14:05:40 +02:00
  • c5191ea18a Update otc/dev.t09.de/stacks/forgejo/forgejo-server/manifests/forgejo-ingress.yaml manuel.ganter 2026-05-05 08:29:25 +00:00
  • 556a784beb
    fix(stacks-instances): 🚑 add ci-sizer registry entry for dev.t09.de Daniel Sy 2026-04-29 10:41:29 +02:00
  • 2e90240c81
    refactor(stacks-instances): 🚚 migrate sizer-receiver to ci-sizer namespace (dev.t09.de) Daniel Sy 2026-04-29 10:18:36 +02:00
  • 9d042eee1c
    chore: ⬆️ bump garm image to v0.1.7-forgejo-22 on dev.t09.de Daniel Sy 2026-04-28 10:11:09 +02:00
  • bc96d8d7aa
    chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-21 Daniel Sy 2026-04-24 15:47:23 +02:00
  • e65abf162e
    chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-20 Daniel Sy 2026-04-24 14:51:58 +02:00
  • a9dcf29f7a
    chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-19 Daniel Sy 2026-04-24 13:41:52 +02:00
  • b72e2049e3
    chore: bump garm image to v0.1.7-forgejo-18 for dev.t09.de Daniel Sy 2026-04-22 13:19:32 +02:00
  • 4cea4ffde7
    chore: bump garm to v0.1.7-forgejo-17 (activeDeadlineSeconds) Daniel Sy 2026-04-21 17:15:41 +02:00
  • 0b13b89640
    chore(garm): ⬆️ bump garm-helm to v0.0.15 (startup probe fix) Daniel Sy 2026-04-21 16:27:25 +02:00
  • 4aa8973c91
    chore(garm): ⬆️ bump garm-helm chart to v0.0.14 Daniel Sy 2026-04-21 16:03:06 +02:00
  • c682c48be0
    chore: bump garm image to v0.1.7-forgejo-16 Daniel Sy 2026-04-21 15:53:50 +02:00
  • 61721097d6
    chore(sizer): 🔧 rename forgejo-runner-sizer to ci-sizer in deployment configs Daniel Sy 2026-04-21 14:16:39 +02:00