Compare commits

...
Sign in to create a new pull request.

114 commits

Author SHA1 Message Date
3141b7bd6c
feat(observability): comprehensive platform alert rules
Replace ad-hoc forgejo/disk alerts with structured VMRule covering:
- platform-health: ForgejoDown, IngressHighErrorRate, NodeNotReady, PodCrashLooping
- storage: PVCUsageHigh (>80%), PVCUsageCritical (>90%)
- resources: NodeCPUHigh (>85%), NodeMemoryHigh (>90%)
2026-06-19 16:43:28 +02:00
70939149ea
feat(observability): add read routes to vmauth for dev.t09.de instance 2026-06-19 16:37:37 +02:00
23edd5d6b4
feat(observability): add read routes to vmauth for metrics and logs queries 2026-06-19 16:33:07 +02:00
0a249820de
fix(observability): 🐛 fix ArgoCD scrape port name http-metrics not metrics 2026-06-19 16:11:15 +02:00
f3931dc550
fix(observability): 🐛 add ArgoCD + GARM VMServiceScrapes to dev client stack 2026-06-19 16:07:27 +02:00
8488de0c6f
fix(observability): 🐛 use plaintext password in hub VMUser to unblock operator reconciliation
The hub VMUser was using passwordRef pointing to simple-user-secret, but that
Secret was not present in the cluster (only exists in git now via the previous
commit). VM operator skips VMUser reconciliation when passwordRef cannot resolve,
leaving vmauth with only the unauthorizedUser catch-all (vmsingle).

Switching to inline password ensures immediate operator reconciliation without
waiting for Secret deployment. The simple-user-secret.yaml manifest is kept for
Vector's credential reference.
2026-06-19 15:45:55 +02:00
b1a00d0395
fix(observability): 🐛 add missing simple-user-secret to hub observability stack
The hub's VMUser (vmauth.yaml) references simple-user-secret via passwordRef,
but the Secret was never added to the hub's manifests. Without this Secret,
the VM operator cannot reconcile the VMUser into the vmauth config, causing
ALL requests to fall through to the unauthorizedUser catch-all (vmsingle).

Result: Vector log shipping to VictoriaLogs was broken — vmauth routed
/insert/elasticsearch/_bulk to vmsingle instead of vlogs-victorialogs.
2026-06-19 15:28:14 +02:00
4591ee7b14
feat(observability): 🗂️ organize dashboards into Grafana folders
Assigns folder field to all GrafanaDashboard CRs:
- EDP / Overview: platform-overview
- EDP / Applications: forgejo, argocd-operational, garm, argocd
- EDP / Operations: cronjob-monitoring, ingress-nginx, victoria-logs
2026-06-19 14:46:41 +02:00
7f5c680e19
fix(observability): 🐛 enable GARM unauthenticated metrics + ArgoCD metrics on all instances
- GARM dev.t09.de: set garm.metrics.disableAuth=true to unblock Prometheus scraping (was 401)
- ArgoCD dev.t09.de: add controller/server/repoServer/applicationSet metrics blocks
- ArgoCD edp.buildth.ing: add controller/server/repoServer/applicationSet metrics blocks
- ArgoCD benchmark.t09.de: add controller/server/repoServer/applicationSet metrics blocks
- observability.buildth.ing already had metrics enabled (no change needed)
2026-06-19 13:36:26 +02:00
b6fbd3f6eb
feat(observability): add VictoriaLogs log panels to platform, forgejo, argocd dashboards 2026-06-19 13:34:12 +02:00
bcf583a055
fix(observability): 🐛 fix Vector log shipping URL on all clusters
Restores missing '.buildth.ing' domain segment in Vector elasticsearch
endpoint for benchmark, dev, and edp instances.

Template source uses {{{ .Env.DOMAIN_O12Y }}} (correct) — instances
were mis-hydrated, omitting the TLD suffix.
2026-06-19 13:32:23 +02:00
238ef71630
fix(observability): 🐛 fix remote write URL and add manifests for benchmark + edp clients
- Fix broken remote write URL (o12y.observability./ → o12y.observability.buildth.ing/)
- Create manifests/ dirs with .gitkeep for benchmark.t09.de and edp.buildth.ing
- Copy forgejo-scrape.yaml VMServiceScrape manifest to both instances
2026-06-19 13:23:50 +02:00
076b2a16c9
fix(observability): 🐛 fix datasource UIDs, replace cronjob dashboard, add GARM
- Remove all ${DS_VICTORIAMETRICS} uid refs from platform-overview; use
  type-only datasource so grafana-operator resolves default prometheus DS
- Replace grafanaCom id:14279 cronjob dashboard with inline custom version
  supporting cluster_environment variable (dev/edp/observability)
- Add new GARM runners dashboard (edp-garm) ready for when GARM metrics
  are scraped; uses or vector(0) guards so panels show 0 not empty

Note: cluster_environment values confirmed as dev/edp/observability (no benchmark).
GARM metrics not yet present in VictoriaMetrics (0 series found).
2026-06-19 13:11:42 +02:00
6ea1e798d2
fix(observability): 🐛 add missing manifests to instance stacks
- backup-alerts.yaml → observability.buildth.ing victoria-k8s-stack
- forgejo-scrape.yaml → dev.t09.de vm-client-stack
2026-06-19 13:06:24 +02:00
91db8038e6
feat(observability): custom ArgoCD dashboard with cluster_environment filter 2026-06-19 13:02:48 +02:00
949529eb5c
feat(observability): add cluster_environment dropdown to Forgejo and platform-overview dashboards
- Replace grafanaCom import (17802) with custom inline Forgejo dashboard
  containing cluster_environment query variable (refresh=2, label=Environment)
- Add label, refresh=2, sort=1 to platform-overview cluster_environment variable
- ArgoCD (19993) and CronJob (14279) remain grafanaCom imports (acceptable)
2026-06-19 12:50:32 +02:00
c2528f6f69
feat(observability): add platform grafana dashboard CRs
- Add forgejo.yaml: Forgejo app dashboard (grafana.com ID 17802)
- Add argocd-operational.yaml: ArgoCD operational dashboard (grafana.com ID 19993)
- Add cronjob-monitoring.yaml: CronJob/backup monitoring dashboard (grafana.com ID 14279)
- Add platform-overview.yaml: custom EDP Platform Overview inline dashboard
  (platform health, forgejo stats, resource usage, backup status rows)
- Fix victoria-logs.yaml: replace broken URL with grafanaCom ID 22698
2026-06-19 12:47:44 +02:00
0316eefa43
fix(observability): 🐛 disable false-positive control-plane alerts and fix empty cluster_environment label
Hub defaultRules groups kubernetesSystemControllerManager, kubeScheduler, and
kubernetesSystemScheduler used wrong key 'enabled: false' — chart expects 'create: false'.
This caused KubeControllerManagerDown/KubeSchedulerDown to fire as false positives
because OTC CCE managed k8s does not expose control plane for scraping.

Dev local vmagent had empty externalLabels, so backup-alert rules evaluated by local
vmalert had no cluster_environment label on kube_job_status_failed metrics. Added
cluster_environment=dev to match what the vm-client-stack vmagent adds for hub shipping.
2026-06-19 12:42:21 +02:00
32e998df5b
fix(forgejo): ⏱️ increase s3-backup activeDeadlineSeconds 1350→7200
Previous 22.5m deadline caused DeadlineExceeded on 2026-06-19 when
rclone sync took >22m (vs 13-16s prior days). Likely triggered by
significant new data in OBS bucket. 2h window accommodates large
incremental syncs while BackupJobTooSlow alert still fires at 5m.
2026-06-19 12:35:41 +02:00
59eed97263
fix(observability-client): 🐛 fix remote write URL and add missing manifests dir
- Fix broken remote write URL: o12y.observability. → o12y.observability.buildth.ing
- Create manifests/ directory with .gitkeep for ArgoCD source path
2026-06-19 11:41:26 +02:00
369961a940
fix(observability): 🐛 enable vmagent, fix grafana auth, disable vmauth on dev
- Enable VMAgent (was disabled → no metrics scraped)
- Remove disable_login from Grafana config; add security block so operator can auth via API
- Disable VMAuth (invalid trailing-dot hostname o12y.observability.; not needed on dev)
2026-06-19 10:44:34 +02:00
d83945413d
fix(observability): 🐛 change VLSingle → VLogs in victorialogs manifest
Chart 0.48.1 / operator v0.58.0 uses VLogs CRD for VictoriaLogs, not
VLSingle. The VLSingle kind was introduced in a newer operator version
and is not registered in this chart release. Changing to VLogs which
has identical spec fields (retentionPeriod, removePvcAfterDelete,
storage, storageMetadata, resources all supported).
2026-06-19 10:20:19 +02:00
ef4a1d7ce2
fix(observability): 🐛 disable crds.cleanup hook in victoria-metrics-operator
Pre-upgrade cleanup hook uses bitnami/kubectl and spawns on every ArgoCD
sync. Dev cluster nodes are at 99% CPU / pod limit — hook pod cannot be
scheduled, blocking the entire sync indefinitely.

Disabling cleanup.enabled prevents the hook Job from being created.
CRD cleanup is safe to skip on a fresh bootstrap where no old CRDs exist.
2026-06-19 09:58:55 +02:00
29c0a59734
fix(observability): 🐛 add SkipDryRunOnMissingResource to o12y syncOptions
VLSingle CRD missing at sync time — ArgoCD pre-validates all resources
before applying any, causing 'synchronization tasks not valid' on CRs
whose CRDs are created by the operator in the same sync wave.
SkipDryRunOnMissingResource=true bypasses dry-run for missing CRDs,
unblocking the CRD bootstrap deadlock.
2026-06-19 09:56:24 +02:00
a52a6691a8
fix(observability): 🐛 add prune + RespectIgnoreDifferences to o12y syncPolicy
Fix CRD bootstrap deadlock on victoria-metrics-k8s-stack ArgoCD app.
Adds prune: true and RespectIgnoreDifferences=true to prevent sync
failures when CRs are applied before CRDs are established.
2026-06-19 09:52:01 +02:00
Martin McCaffery
9ed3ff50d2
bump(benchmark): ci-sizer-collector sidecar 0.9.0 → 0.9.7 to pick up host-resolved kernel_peak + cgroup_path_count diagnostic 2026-06-17 11:38:55 +02:00
57ee5afa62
feat(observability): add VMServiceScrapes + migrate VLogs → VLSingle
- Migrate VLogs CRD to VLSingle (operator.victoriametrics.com/v1beta1)
- Add VMServiceScrape for Forgejo (gitea ns, port http, /metrics)
- Add VMServiceScrape for ArgoCD (argocd ns, port http-metrics)
- Add VMServiceScrape for GARM (garm ns, port metrics)
- Add VMServiceScrape for CoreDNS (kube-system ns, k8s-app: kube-dns)

Ref: IPCEICIS-4618, IPCEICIS-5066
2026-06-15 21:05:22 +02:00
7949cabb29
fix(garm): ⬆️ update to v0.1.7-forgejo-24 (fresh multi-arch build)
Build completed successfully. Fixes exec format error from -23.
Dropped stale NOTE warning — image is clean amd64.
2026-06-12 13:42:23 +02:00
8939b4f32b
fix(secrets-backup): 🔄 sync simplified manifest from template
Remove client-side openssl encryption. OBS SSE-KMS handles encryption at rest.
Updated: no apk add openssl, no openssl enc step, no secrets-backup-config Secret,
upload .tar.gz directly. Image tag bumped to 1.0.1 (built without openssl).

Ref: IPCEICIS-9317
2026-06-12 13:12:20 +02:00
900c1f6c80
fix(dev): 🐛 revert automated-upload damage — restore working image pins + OIDC secrets
Automated upload (95deeef) overwrote 5 manually-pinned values:

- forgejo-server: restore workflow-webhook-20260305 (DB has v15a/v15b
  migrations; rolling back to 14.0.2-edp1-rootless WILL break the DB)
- garm: restore v0.1.7-forgejo-22 (v0.1.7-forgejo-23 has exec format
  error — wrong arch build, crashes on OTC CCE amd64 nodes)
- sizer-receiver/secret.yaml: re-add sizer-oidc-client secret (deleted
  by upload; causes OIDC auth failure on every sizer-receiver login)
- dex/manifests/dex-sizer-client.yaml: re-add (deleted by upload;
  dex cannot resolve sizer OIDC client without this secret)
- dex.yaml: restore manifests source block (removed by upload;
  without it ArgoCD never deploys the dex/manifests/ directory)

backup-alerts.yaml (new VMRule from automated upload) is kept as-is.
2026-06-12 10:11:00 +02:00
Automated pipeline
95deeef6a0 Automated upload for dev.t09.de 2026-06-12 07:46:00 +00:00
9bbcf4efca
fix(secrets-backup): 🐛 add openssl install + upgrade image to 1.32.0
alpine/k8s:1.28.0 does not ship openssl. Script calls openssl enc
on line 116 causing exit 127 on every run since initial deploy.

Fix:
- apk add --no-cache openssl at script start (defensive, idempotent)
- upgrade image 1.28.0 -> 1.32.0 (kubectl client was 5 minor versions
  behind cluster v1.33, outside supported skew of +/-1)
2026-06-12 09:32:48 +02:00
cf8271fd86
revert(ci-sizer): 🔥 revert image pin — no versioned images in registry
GoReleaser config uses 'dockers_v2' (invalid key, should be 'dockers')
so versioned container images were never pushed. Only :latest exists.
Reverting to :latest until CI pipeline is fixed to publish version tags.

Refs: IPCEICIS-9326
2026-06-08 18:12:56 +02:00
f4aa470894
fix(ci-sizer): 📌 pin sizer-receiver to v0.8.1 for dev
v0.8.2 does not exist — tags go v0.8.1 → v0.8.3.
v0.8.3 introduced RequireOrgMatch middleware that breaks dev env where
repos are under giteaAdmin but OIDC org resolves differently.
Pin to v0.8.1 until IPCEICIS-9326 fixes multi-env org support.
2026-06-08 18:08:04 +02:00
3fdfda9da7
fix(ci-sizer): 📌 pin sizer-receiver to v0.8.2 for dev
v0.8.3 introduced RequireOrgMatch middleware that breaks dev env where
repos are under giteaAdmin but OIDC org resolves differently.
Pin to v0.8.2 until IPCEICIS-9326 fixes multi-env org support.
2026-06-08 18:06:00 +02:00
69839f767b
fix(ci-sizer): 🐛 set RECEIVER_ALLOWED_ORG=giteaAdmin for dev
Dev Forgejo repos live under giteaAdmin user, not DevFW org.
Prod will use DevFW-CICD (template default). Dev needs explicit override.
2026-06-08 18:00:47 +02:00
925c7416b3
fix(ci-sizer): 🐛 revert RECEIVER_ALLOWED_ORG to DevFW for dev env
Template default is DevFW-CICD (prod), but dev Forgejo uses DevFW org.
Hydration overwrote the correct value today.
2026-06-08 17:51:14 +02:00
bd82384eb1
fix(dex): 🔐 correct sizer client secret to match sizer-oidc-client
The deploy hydration created dex-sizer-client with wrong value.
Reverting to the original shared secret that sizer expects
(73eda906... - active for 81 days before hydration overwrote it).

Changes:
- sizer-oidc-client: restore correct shared secret
- dex-sizer-client: add managed manifest to prevent future drift
- dex.yaml: add manifests source for ArgoCD to sync the secret

Broken by stacks rehydration pipeline run.
2026-06-08 17:11:10 +02:00
967edf0382
fix(ci-sizer): 🔐 align OIDC client secret with dex config
Secret mismatch caused infinite login loop on sizer.dev.t09.de.
Added sizer-oidc-client secret manifest to GitOps so ArgoCD manages it.
Value now matches dex-runner-sizer-client (dex side).
2026-06-08 17:00:38 +02:00
Daniel.Sy
9a7544418c fix(forgejo): 🐛 use workflow-webhook image matching DB migration level (v15a/v15b)
DB was migrated to v15 schema by this image in March.
The 14.0.2-edp1-rootless image cannot start against it.
Today's automated pipeline sync triggered pod restart, exposing the mismatch.
2026-06-08 14:11:31 +00:00
Daniel.Sy
a047be3aae fix(garm): ⬇️ rollback to v0.1.7-forgejo-22 — -23 has exec format error (wrong arch) 2026-06-08 14:11:05 +00:00
Automated pipeline
422f568c8e Automated upload for dev.t09.de 2026-06-08 12:15:27 +00:00
Martin McCaffery
011f436fb7
feat(benchmark.t09.de/garm): bump ci-sizer-collector 0.8.3 → 0.9.0 (kernel-peak + cgroup-v1 limit fallback) 2026-06-03 15:01:09 +01:00
Martin McCaffery
14873b7941
fix(garm): bump dev+benchmark to garm-helm v0.0.17 (template-robust readToken); drop now-redundant explicit fields on benchmark 2026-06-02 16:21:51 +01:00
Martin McCaffery
63cdb926b9
fix(sustainability-rules): remove Kepler energy rules since Kepler is incompatible 2026-06-02 16:12:22 +01:00
Martin McCaffery
f98f53a5a0
revert(kepler): remove Kepler, incompatible with OTC CCE proc mount restrictions 2026-06-02 16:12:06 +01:00
Martin McCaffery
608439697b
fix(benchmark.t09.de/garm): pin ci-sizer-collector to 0.8.3 (latest tagged release, avoid :latest drift during long runs) 2026-06-02 16:08:35 +01:00
Martin McCaffery
b5594a8017
feat(observability): add sustainability metrics, Kepler, 6-month retention, GARM scrape 2026-06-02 15:51:26 +01:00
Martin McCaffery
bbdca11f00
fix(benchmark.t09.de/garm): bump ci-sizer-collector to :latest (0.0.4 tag doesn't exist in registry, was unreachable until sizer integration was restored) 2026-06-02 15:42:10 +01:00
Martin McCaffery
3be56f5a07
fix(vm-client): add nodename-to-IP metricRelabelConfig for node-exporter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-02 14:58:36 +01:00
Martin McCaffery
e2469e7843
fix(benchmark.t09.de/garm): explicit sizer readToken mountPath/key/fileName (chart defaults not deep-merging, was rendering broken %!s(<nil>) path that crashed sizer consultation) 2026-06-02 14:38:41 +01:00
Martin McCaffery
b98486f445
fix: argocd metrics port name, coredns metrics via headless service 2026-06-02 12:13:38 +01:00
Martin McCaffery
eca54cb19c
fix(vm-client): use in-cluster VMSingle URL for remote write 2026-06-02 12:03:44 +01:00
Martin McCaffery
71a8fef501
fix(vm-client): create missing manifests directory 2026-06-02 11:59:42 +01:00
Martin McCaffery
e95fa403e9
fix(benchmark.t09.de/garm): wire sizer baseUrl + readToken so edge-connect-k8s provider actually applies sizer recommendations (was silently no-op) 2026-06-02 11:56:11 +01:00
Martin McCaffery
d0b0c85cf8
fix: add ServerSideApply for argocd CRDs, remove deprecated vector playground field 2026-06-02 09:57:05 +01:00
Martin McCaffery
07261b081e
upgrade victoria-metrics-k8s-stack 0.48.1 -> 0.81.0 with values migration 2026-06-02 09:51:49 +01:00
Martin McCaffery
07d08e5839
upgrade chart versions: argocd, dex, cloudnative-pg, cert-manager, ingress-nginx, vector, metrics-server 2026-06-02 09:50:04 +01:00
Martin McCaffery
342870fa03
fix(vm-client): add cluster external label for dashboard variable resolution 2026-06-02 09:30:24 +01:00
Martin McCaffery
da0ccbd1b5
fix(observability): enable ArgoCD/CoreDNS scraping, add cluster label, fix node dashboard 2026-06-01 16:47:31 +01:00
Martin McCaffery
3212016398
fix(vector): use in-cluster endpoint for VictoriaLogs log shipping 2026-06-01 16:47:24 +01:00
Martin McCaffery
e89d48c2a5
Upgrade Grafana to 12.4.0 and add auth.jwt config for useKubeAuth 2026-06-01 13:16:37 +01:00
Martin McCaffery
32fd6ffd54
Remove useKubeAuth temporarily to unblock operator upgrade 2026-06-01 13:08:25 +01:00
Martin McCaffery
3b31475552
Fix grafana-operator chart version tag (no v prefix) 2026-06-01 13:02:49 +01:00
Martin McCaffery
1686764b39
Upgrade grafana-operator to v5.23.0 and enable useKubeAuth 2026-06-01 12:58:14 +01:00
a7bc25603c
Added DevFW-CICD users as admins 2026-05-19 14:01:18 +02:00
Martin McCaffery
c927cbd0dc
bump garm-helm to v0.0.16 for benchmark 2026-05-19 09:54:48 +02:00
Martin McCaffery
732a27d5f1
fix(benchmark): disable 2FA requirement for benchmark cluster 2026-05-18 17:23:11 +02:00
Automated pipeline
3c8850d2e2 Automated upload for benchmark.t09.de 2026-05-18 15:20:18 +00:00
Automated pipeline
f12daac048 Automated upload for benchmark.t09.de 2026-05-18 14:32:18 +00:00
Automated pipeline
27475f9cf3 Automated upload for benchmark.t09.de 2026-05-18 14:04:23 +00:00
Automated pipeline
046679e355 Automated upload for benchmark.t09.de 2026-05-18 10:29:51 +00:00
7e1b0418f6
feat(benchmark): add ci-sizer registry app for benchmark.t09.de 2026-05-18 12:21:49 +02:00
Automated pipeline
f2747ece68 Automated upload for benchmark.t09.de 2026-05-18 10:02:58 +00:00
75e4a2384b
fix(ci-sizer): 🐛 align GARM_URL with template output
Use short service DNS (garm.garm.svc:80) instead of FQDN
(garm.garm.svc.cluster.local:80) to match what the stack template
now generates.

Ref: IPCEICIS-6886
2026-05-18 10:26:23 +02:00
Automated pipeline
8b9fb6bdd8 Automated upload for benchmark.t09.de 2026-05-13 11:39:29 +00:00
Daniel.Sy
2c14713ae5 feat(benchmark): add ci-sizer registry for benchmark.t09.de [4/4] 2026-05-13 10:19:43 +00:00
Daniel.Sy
1a591f1c37 feat(benchmark): add ci-sizer ingress for benchmark.t09.de [3/4] 2026-05-13 10:19:36 +00:00
Daniel.Sy
6977dac98d feat(benchmark): add ci-sizer deployment for benchmark.t09.de [2/4] 2026-05-13 10:19:29 +00:00
Daniel.Sy
b84476f71e feat(benchmark): add ci-sizer stacks-instances for benchmark.t09.de [1/4] 2026-05-13 10:19:17 +00:00
d4b54c854f
fix: increased pvc size due to out of disk space error 2026-05-11 10:56:01 +02:00
bc086d5c31
fix: increase smol backup disk 2026-05-07 17:40:36 +02:00
5be2bf1409
fix: increased body size by 10x for large image layer uploads 2026-05-05 14:05:40 +02:00
manuel.ganter
c5191ea18a Update otc/dev.t09.de/stacks/forgejo/forgejo-server/manifests/forgejo-ingress.yaml 2026-05-05 08:29:25 +00:00
556a784beb
fix(stacks-instances): 🚑 add ci-sizer registry entry for dev.t09.de
Create ci-sizer-reg ArgoCD app-of-apps to manage the sizer-receiver
after migration from garm namespace. Restores sizer.dev.t09.de ingress.
2026-04-29 10:41:29 +02:00
2e90240c81
refactor(stacks-instances): 🚚 migrate sizer-receiver to ci-sizer namespace (dev.t09.de)
Move sizer-receiver from stacks/garm/ to stacks/ci-sizer/ for
dev.t09.de only. edp.buildth.ing stays in garm (not deployed yet).
2026-04-29 10:36:14 +02:00
9d042eee1c
chore: ⬆️ bump garm image to v0.1.7-forgejo-22 on dev.t09.de 2026-04-28 10:11:09 +02:00
bc96d8d7aa
chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-21 2026-04-24 15:47:23 +02:00
e65abf162e
chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-20 2026-04-24 14:51:58 +02:00
a9dcf29f7a
chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-19 2026-04-24 13:41:52 +02:00
b72e2049e3
chore: bump garm image to v0.1.7-forgejo-18 for dev.t09.de 2026-04-22 13:19:32 +02:00
4cea4ffde7
chore: bump garm to v0.1.7-forgejo-17 (activeDeadlineSeconds) 2026-04-21 17:15:41 +02:00
0b13b89640
chore(garm): ⬆️ bump garm-helm to v0.0.15 (startup probe fix) 2026-04-21 16:27:25 +02:00
4aa8973c91
chore(garm): ⬆️ bump garm-helm chart to v0.0.14 2026-04-21 16:03:06 +02:00
c682c48be0
chore: bump garm image to v0.1.7-forgejo-16 2026-04-21 15:53:50 +02:00
61721097d6
chore(sizer): 🔧 rename forgejo-runner-sizer to ci-sizer in deployment configs
- Update container image names to ci-sizer-{receiver,collector}
- Update Dex OIDC client ID and name to ci-sizer
- Template allowed-org as SIZER_ALLOWED_ORG variable
2026-04-21 14:16:39 +02:00
487e1ac15c
chore(garm): ⬆️ bump garm to v0.1.7-forgejo-15 2026-04-20 17:32:22 +02:00
2af607e949
chore(garm): ⬆️ bump garm to v0.1.7-forgejo-14, add CPU sizing mode env vars 2026-04-20 16:08:12 +02:00
f2c885cd84
fix(sizer): 🔧 sync gitops with live deployment — add OIDC config, remove legacy Forgejo tokens 2026-04-16 15:05:53 +02:00
08740eb1da
chore: bump garm image to v0.1.7-forgejo-13 (RunNumber enrichment via WebSocket) 2026-04-16 13:32:12 +02:00
47f99082db
feat(sizer-receiver): add GARM WebSocket event enrichment env vars
Add GARM_URL, GARM_USER, and GARM_PASSWORD environment variables to
the sizer-receiver deployment so it can connect to GARM's WebSocket
event stream for run-status enrichment.

Ref: IPCEICIS-8514
2026-04-15 15:46:55 +02:00
a3bae88ce9
fix(sizer-receiver): 🐛 add fsGroup to pod securityContext for PVC write access
Distroless nonroot container (UID 65534) needs matching fsGroup to write
to the PVC used for SQLite migrations.

Ref: IPCEICIS-8514
2026-04-15 14:45:27 +02:00
9374d90d1f
chore(garm): ⬆️ bump image to v0.1.7-forgejo-12 (ParseExtraSpecs fix)
Pick up double-encoding fix from garm-provider-edge-connect v2.0.30.

Ref: IPCEICIS-8514
2026-04-15 13:50:54 +02:00
e0f74e9ec4
chore(garm): ⬆️ bump image to v0.1.7-forgejo-11 with fixed provider binary
Ref: IPCEICIS-8514
2026-04-15 12:25:37 +02:00
58c694c9d1
chore(garm): 📦 bump image to v0.1.7-forgejo-10 (GitHub Actions cgroup fix)
Provider v2.0.27 fixes CIProvider-aware CGROUP_PROCESS_MAP for GitHub
Actions runner detection, completing multi-provider support.

Ref: IPCEICIS-8514
2026-04-15 10:23:57 +02:00
d1ab2f6c85
chore(garm): 📦 bump image to v0.1.7-forgejo-9 (multi-provider support)
garm-provider-edge-connect v2.0.26 adds GitHub Actions + Forgejo multi-provider support.
2026-04-14 16:58:24 +02:00
246be79659
chore(garm): bump to v0.1.7-forgejo-8 (revert buildkitd wrapper) 2026-04-14 13:01:17 +02:00
6f9a6372f1
chore(garm): bump garm image to v0.1.7-forgejo-7
- Includes provider v2.0.24 with pod cleanup fixes:
  - GetPod returns terminal pods for proper GARM lifecycle
  - ListInstances prefix mismatch fixed
  - ProviderID consistency fix
  - buildkitd SIGTERM graceful shutdown
2026-04-14 11:23:53 +02:00
d116313afe
chore(garm): bump to v0.1.7-forgejo-6 (provider nil map fix) 2026-04-13 18:02:37 +02:00
ee8b2f0e9c
chore(garm): bump helm chart to v0.0.13 for nodes RBAC 2026-04-13 16:35:44 +02:00
dedebf1747
chore(garm): update image to v0.1.7-forgejo-5 and add pending_timeout config 2026-04-13 15:23:48 +02:00
46a1c1aa33
feat(dex): add forgejo-runner-sizer OIDC static client
Register forgejo-runner-sizer as a Dex static client for OIDC
authentication on sizer.dev.t09.de. Adds the client secret env var
injection and the staticClients entry with secretEnv reference.
2026-04-10 13:22:45 +02:00
Automated pipeline
2f15b6b373 Automated upload for edp.buildth.ing 2026-03-17 13:25:52 +00:00
Automated pipeline
4b11db5668
Automated upload for dev.t09.de 2026-03-17 14:16:23 +01:00
146 changed files with 6918 additions and 240 deletions

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: edfbuilder
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/registry"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ci-sizer-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/ci-sizer"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coder-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/coder"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: core
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/core"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: docs-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: argocd-stack
repoURL: "https://edp.buildth.ing/DevFW-CICD/website-and-documentation"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: forgejo
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/forgejo"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: garm-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/garm"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: observability-client
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/observability-client"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: observability
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/observability"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: otc
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/otc"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: terralist-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/benchmark.t09.de/stacks/terralist"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,29 @@
# Optional: GitLab CI integration
# Only hydrate this app for clusters that run GitLab Runner.
# For Forgejo/GitHub-only deployments, omit this app from stacks-instances.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: gitlab-sizer-webhook
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ci-sizer
source:
repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/ci-sizer/gitlab-webhook"

View file

@ -0,0 +1,27 @@
# Self-signed Issuer for webhook TLS.
# For production, replace with a ClusterIssuer backed by a real CA.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
# cert-manager Certificate for the webhook TLS.
# The resulting Secret (gitlab-sizer-webhook-tls) is mounted into the webhook pod.
# cert-manager also injects the CA into the MutatingWebhookConfiguration via the
# cert-manager.io/inject-ca-from annotation.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: gitlab-sizer-webhook-cert
spec:
secretName: gitlab-sizer-webhook-tls
issuerRef:
name: selfsigned-issuer
kind: Issuer
dnsNames:
- gitlab-sizer-webhook.ci-sizer.svc
- gitlab-sizer-webhook.ci-sizer.svc.cluster.local
duration: 8760h
renewBefore: 720h

View file

@ -0,0 +1,141 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-sizer-webhook
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gitlab-sizer-webhook
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitlab-sizer-webhook
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gitlab-sizer-webhook
subjects:
- kind: ServiceAccount
name: gitlab-sizer-webhook
namespace: ci-sizer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-sizer-webhook
labels:
app: gitlab-sizer-webhook
spec:
replicas: 2
selector:
matchLabels:
app: gitlab-sizer-webhook
template:
metadata:
labels:
app: gitlab-sizer-webhook
spec:
serviceAccountName: gitlab-sizer-webhook
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: webhook
image: edp.buildth.ing/devfw-cicd/gitlab-webhook-edge-connect:latest
imagePullPolicy: Always
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
ports:
- containerPort: 8443
protocol: TCP
args:
- --listen-addr=:8443
- --tls-cert-file=/etc/webhook/tls/tls.crt
- --tls-key-file=/etc/webhook/tls/tls.key
- --sizer-url=http://sizer-receiver.ci-sizer.svc:8080
- --sizer-sidecar-image=edp.buildth.ing/devfw-cicd/ci-sizer-collector:latest
env:
- name: WEBHOOK_SIZER_READ_TOKEN
valueFrom:
secretKeyRef:
name: gitlab-sizer-webhook-tokens
key: sizer-read-token
- name: WEBHOOK_SIZER_PUSH_TOKEN
valueFrom:
secretKeyRef:
name: gitlab-sizer-webhook-tokens
key: sizer-push-token
- name: HTTP_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: HTTP_PROXY
optional: true
- name: HTTPS_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: HTTPS_PROXY
optional: true
- name: NO_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: NO_PROXY
optional: true
volumeMounts:
- name: webhook-tls
mountPath: /etc/webhook/tls
readOnly: true
livenessProbe:
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 128Mi
volumes:
- name: webhook-tls
secret:
secretName: gitlab-sizer-webhook-tls
---
apiVersion: v1
kind: Service
metadata:
name: gitlab-sizer-webhook
labels:
app: gitlab-sizer-webhook
spec:
selector:
app: gitlab-sizer-webhook
ports:
- port: 443
targetPort: 8443
protocol: TCP

View file

@ -0,0 +1,30 @@
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: gitlab-sizer-webhook
annotations:
cert-manager.io/inject-ca-from: ci-sizer/gitlab-sizer-webhook-cert
webhooks:
- name: gitlab-sizer-webhook.ci-sizer.svc
admissionReviewVersions: ["v1"]
sideEffects: NoneOnDryRun
failurePolicy: Ignore
timeoutSeconds: 5
reinvocationPolicy: Never
clientConfig:
service:
name: gitlab-sizer-webhook
namespace: ci-sizer
path: /mutate
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods"]
namespaceSelector:
matchLabels:
ci-sizer.devfw.io/watch: "true"
objectSelector:
matchExpressions:
- key: job.runner.gitlab.com/pod
operator: Exists

View file

@ -0,0 +1,29 @@
# Required: CI Sizer receiver
# Always deploy this — it stores metrics and computes sizing recommendations.
# Works standalone or with GARM (Forgejo/GitHub) and/or GitLab webhook.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: sizer-receiver
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ci-sizer
source:
repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/ci-sizer/sizer-receiver"

View file

@ -0,0 +1,126 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: sizer-receiver
labels:
app: sizer-receiver
spec:
strategy:
type: Recreate
replicas: 1
selector:
matchLabels:
app: sizer-receiver
template:
metadata:
labels:
app: sizer-receiver
spec:
securityContext:
fsGroup: 65534
containers:
- name: receiver
image: edp.buildth.ing/devfw-cicd/ci-sizer-receiver:latest
imagePullPolicy: Always
args:
- --db=/data/metrics.db
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: RECEIVER_READ_TOKEN
valueFrom:
secretKeyRef:
name: sizer-tokens
key: read-token
- name: RECEIVER_HMAC_KEY
valueFrom:
secretKeyRef:
name: sizer-tokens
key: hmac-key
- name: GARM_URL
value: "http://garm.garm.svc:80"
- name: GARM_USER
value: "admin"
- name: GARM_PASSWORD
valueFrom:
secretKeyRef:
name: garm-fixed-credentials
key: admin_password
- name: RECEIVER_OIDC_ISSUER
value: "https://dex.benchmark.t09.de"
- name: RECEIVER_OIDC_CLIENT_ID
value: "ci-sizer"
- name: RECEIVER_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: sizer-oidc-client
key: client-secret
- name: RECEIVER_OIDC_REDIRECT_URI
value: "https://sizer.benchmark.t09.de/ui/callback"
- name: RECEIVER_SESSION_TTL
value: "12h"
- name: RECEIVER_ALLOWED_ORG
value: "giteaAdmin"
- name: RECEIVER_CPU_SIZING_MODE
value: "observe"
- name: RECEIVER_MEMORY_QOS
value: "guaranteed"
volumeMounts:
- name: data
mountPath: /data
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 2
periodSeconds: 10
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
volumes:
- name: data
persistentVolumeClaim:
claimName: sizer-receiver-data
---
apiVersion: v1
kind: Service
metadata:
name: sizer-receiver
labels:
app: sizer-receiver
spec:
selector:
app: sizer-receiver
ports:
- name: http
port: 8080
targetPort: http
protocol: TCP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sizer-receiver-data
labels:
app: sizer-receiver
annotations:
everest.io/disk-volume-type: GPSSD
spec:
storageClassName: csi-disk
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

View file

@ -0,0 +1,36 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
cert-manager.io/cluster-issuer: main
name: sizer-receiver
namespace: ci-sizer
spec:
ingressClassName: nginx
rules:
- host: sizer.benchmark.t09.de
http:
paths:
- backend:
service:
name: sizer-receiver
port:
number: 8080
path: /
pathType: Prefix
- host: ci-sizer.benchmark.t09.de
http:
paths:
- backend:
service:
name: sizer-receiver
port:
number: 8080
path: /
pathType: Prefix
tls:
- hosts:
- sizer.benchmark.t09.de
secretName: sizer-receiver-tls

View file

@ -0,0 +1,32 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coder
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: coder
sources:
- repoURL: https://helm.coder.com/v2
chart: coder
targetRevision: 2.28.3
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/coder/coder/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/coder/coder/manifests"

View file

@ -0,0 +1,38 @@
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: coder-db
namespace: coder
spec:
instances: 1
primaryUpdateStrategy: unsupervised
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "1Gi"
cpu: "1"
managed:
roles:
- name: coder
createdb: true
login: true
passwordSecret:
name: coder-db-user
storage:
size: 10Gi
storageClass: csi-disk
---
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
name: coder
namespace: coder
spec:
cluster:
name: coder-db
name: coder
owner: coder
---

View file

@ -0,0 +1,61 @@
coder:
# You can specify any environment variables you'd like to pass to Coder
# here. Coder consumes environment variables listed in
# `coder server --help`, and these environment variables are also passed
# to the workspace provisioner (so you can consume them in your Terraform
# templates for auth keys etc.).
#
# Please keep in mind that you should not set `CODER_HTTP_ADDRESS`,
# `CODER_TLS_ENABLE`, `CODER_TLS_CERT_FILE` or `CODER_TLS_KEY_FILE` as
# they are already set by the Helm chart and will cause conflicts.
env:
- name: CODER_ACCESS_URL
value: https://coder.benchmark.t09.de
- name: CODER_PG_CONNECTION_URL
valueFrom:
secretKeyRef:
# You'll need to create a secret called coder-db-url with your
# Postgres connection URL like:
# postgres://coder:password@postgres:5432/coder?sslmode=disable
name: coder-db-user
key: url
# For production deployments, we recommend configuring your own GitHub
# OAuth2 provider and disabling the default one.
- name: CODER_OAUTH2_GITHUB_DEFAULT_PROVIDER_ENABLE
value: "false"
- name: EDGE_CONNECT_ENDPOINT
valueFrom:
secretKeyRef:
name: edge-credential
key: endpoint
- name: EDGE_CONNECT_USERNAME
valueFrom:
secretKeyRef:
name: edge-credential
key: username
- name: EDGE_CONNECT_PASSWORD
valueFrom:
secretKeyRef:
name: edge-credential
key: password
# (Optional) For production deployments the access URL should be set.
# If you're just trying Coder, access the dashboard via the service IP.
# - name: CODER_ACCESS_URL
# value: "https://coder.example.com"
#tls:
# secretNames:
# - my-tls-secret-name
service:
type: ClusterIP
ingress:
enable: true
className: nginx
host: coder.benchmark.t09.de
annotations:
cert-manager.io/cluster-issuer: main
tls:
enable: true
secretName: coder-tls-secret

View file

@ -0,0 +1,35 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: argocd
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: argocd
sources:
- repoURL: https://github.com/argoproj/argo-helm.git
path: charts/argo-cd
# TODO: RIRE Can be updated when https://github.com/argoproj/argo-cd/issues/20790 is fixed and merged
# As logout make problems, it is suggested to switch from path based routing to an own argocd domain,
# similar to the CNOE amazon reference implementation and in our case, Forgejo
targetRevision: argo-cd-9.4.6
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/core/argocd/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/core/argocd/manifests"

View file

@ -0,0 +1,27 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
cert-manager.io/cluster-issuer: main
name: argocd-server
namespace: argocd
spec:
ingressClassName: nginx
rules:
- host: argocd.benchmark.t09.de
http:
paths:
- backend:
service:
name: argocd-server
port:
number: 80
path: /
pathType: Prefix
tls:
- hosts:
- argocd.benchmark.t09.de
secretName: argocd-net-tls

View file

@ -0,0 +1,66 @@
global:
domain: argocd.benchmark.t09.de
configs:
params:
server.insecure: true
cm:
oidc.config: |
name: FORGEJO
issuer: https://dex.benchmark.t09.de
clientID: controller-argocd-dex
clientSecret: $dex-argo-client:clientSecret
requestedScopes:
- openid
- profile
- email
- groups
application.resourceTrackingMethod: annotation
timeout.reconciliation: 60s
resource.exclusions: |
- apiGroups:
- "*"
kinds:
- ProviderConfigUsage
- apiGroups:
- cilium.io
kinds:
- CiliumIdentity
clusters:
- "*"
url: https://argocd.benchmark.t09.de
rbac:
policy.csv: 'g, DevFW, role:admin'
tls:
certificates:
controller:
metrics:
enabled: true
serviceMonitor:
enabled: false
server:
metrics:
enabled: true
serviceMonitor:
enabled: false
repoServer:
metrics:
enabled: true
serviceMonitor:
enabled: false
applicationSet:
metrics:
enabled: true
serviceMonitor:
enabled: false
notifications:
enabled: false
dex:
enabled: false

View file

@ -0,0 +1,30 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cloudnative-pg
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
retry:
limit: -1
destination:
name: in-cluster
namespace: cloudnative-pg
sources:
- repoURL: https://cloudnative-pg.github.io/charts
chart: cloudnative-pg
targetRevision: 0.26.1
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/core/cloudnative-pg/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1 @@
# No need for values here.

View file

@ -0,0 +1,29 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dex
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: dex
sources:
- repoURL: https://charts.dexidp.io
chart: dex
targetRevision: 0.23.0
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/core/dex/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,86 @@
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: main
hosts:
- host: dex.benchmark.t09.de
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- dex.benchmark.t09.de
secretName: dex-cert
envVars:
- name: FORGEJO_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-forgejo-client
key: clientSecret
- name: FORGEJO_CLIENT_ID
valueFrom:
secretKeyRef:
name: dex-forgejo-client
key: clientID
- name: OIDC_DEX_GRAFANA_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-grafana-client
key: clientSecret
- name: OIDC_DEX_ARGO_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-argo-client
key: clientSecret
- name: FORGEJO_RUNNER_SIZER_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-sizer-client
key: clientSecret
- name: LOG_LEVEL
value: debug
config:
# Set it to a valid URL
issuer: https://dex.benchmark.t09.de
# See https://dexidp.io/docs/storage/ for more options
storage:
type: memory
oauth2:
skipApprovalScreen: true
alwaysShowLoginScreen: false
connectors:
- type: gitea
id: gitea
name: Forgejo
config:
clientID: "$FORGEJO_CLIENT_ID"
clientSecret: "$FORGEJO_CLIENT_SECRET"
redirectURI: https://dex.benchmark.t09.de/callback
baseURL: https://edp.buildth.ing
# loadAllGroups: true
orgs:
- name: DevFW
enablePasswordDB: false
staticClients:
- id: controller-argocd-dex
name: ArgoCD Client
redirectURIs:
- "https://argocd.benchmark.t09.de/auth/callback"
secretEnv: "OIDC_DEX_ARGO_CLIENT_SECRET"
- id: grafana
redirectURIs:
- "https://grafana.benchmark.t09.de/login/generic_oauth"
name: "Grafana"
secretEnv: "OIDC_DEX_GRAFANA_CLIENT_SECRET"
- id: ci-sizer
name: "CI Sizer"
redirectURIs:
- "https://sizer.benchmark.t09.de/ui/callback"
secretEnv: "FORGEJO_RUNNER_SIZER_CLIENT_SECRET"

View file

@ -1,7 +1,7 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: sizer-receiver
name: forgejo-runner
namespace: argocd
labels:
env: dev
@ -17,9 +17,8 @@ spec:
retry:
limit: -1
destination:
name: in-cluster
namespace: garm
server: "https://kubernetes.default.svc"
source:
repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/dev.t09.de/stacks/garm/sizer-receiver"
path: "otc/benchmark.t09.de/stacks/forgejo/forgejo-runner"

View file

@ -0,0 +1,104 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: forgejo-runner
name: forgejo-runner
namespace: gitea
spec:
# Two replicas means that if one is busy, the other can pick up jobs.
replicas: 3
selector:
matchLabels:
app: forgejo-runner
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: forgejo-runner
spec:
restartPolicy: Always
volumes:
- name: docker-certs
emptyDir: {}
- name: runner-data
emptyDir: {}
# Initialise our configuration file using offline registration
# https://forgejo.org/docs/v1.21/admin/actions/#offline-registration
initContainers:
- name: runner-register
image: code.forgejo.org/forgejo/runner:12.6.4
command:
- "sh"
- "-c"
- |
forgejo-runner \
register \
--no-interactive \
--token ${RUNNER_SECRET} \
--name ${RUNNER_NAME} \
--instance ${FORGEJO_INSTANCE_URL} \
--labels docker:docker://node:24-bookworm,ubuntu-22.04:docker://ghcr.io/catthehacker/ubuntu:act-22.04,ubuntu-latest:docker://ghcr.io/catthehacker/ubuntu:act-24.04,ubuntu-24.04:docker://ghcr.io/catthehacker/ubuntu:act-24.04
env:
- name: RUNNER_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RUNNER_SECRET
valueFrom:
secretKeyRef:
name: forgejo-runner-token
key: token
- name: FORGEJO_INSTANCE_URL
value: https://benchmark.t09.de
volumeMounts:
- name: runner-data
mountPath: /data
containers:
- name: runner
image: code.forgejo.org/forgejo/runner:12.6.4
command:
- "sh"
- "-c"
- |
while ! nc -z 127.0.0.1 2376 </dev/null; do
echo 'waiting for docker daemon...';
sleep 5;
done
forgejo-runner generate-config > config.yml ;
sed -i -e "s|privileged: .*|privileged: true|" config.yml
sed -i -e "s|network: .*|network: host|" config.yml ;
sed -i -e "s|^ envs:$$| envs:\n DOCKER_HOST: tcp://127.0.0.1:2376\n DOCKER_TLS_VERIFY: 1\n DOCKER_CERT_PATH: /certs/client|" config.yml ;
sed -i -e "s|^ options:| options: -v /certs/client:/certs/client|" config.yml ;
sed -i -e "s| valid_volumes: \[\]$$| valid_volumes:\n - /certs/client|" config.yml ;
/bin/forgejo-runner --config config.yml daemon
securityContext:
allowPrivilegeEscalation: true
privileged: true
readOnlyRootFilesystem: false
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
env:
- name: DOCKER_HOST
value: tcp://localhost:2376
- name: DOCKER_CERT_PATH
value: /certs/client
- name: DOCKER_TLS_VERIFY
value: "1"
volumeMounts:
- name: docker-certs
mountPath: /certs
- name: runner-data
mountPath: /data
- name: daemon
image: docker:28.0.4-dind
env:
- name: DOCKER_TLS_CERTDIR
value: /certs
securityContext:
privileged: true
volumeMounts:
- name: docker-certs
mountPath: /certs

View file

@ -0,0 +1,32 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: forgejo-server
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: gitea
sources:
- repoURL: https://code.forgejo.org/forgejo-helm/forgejo-helm.git
path: .
targetRevision: v16.2.0
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/forgejo/forgejo-server/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/forgejo/forgejo-server/manifests"

View file

@ -0,0 +1,27 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: 5120m
cert-manager.io/cluster-issuer: main
name: forgejo-server
namespace: gitea
spec:
ingressClassName: nginx
rules:
- host: benchmark.t09.de
http:
paths:
- backend:
service:
name: forgejo-server-http
port:
number: 3000
path: /
pathType: Prefix
tls:
- hosts:
- benchmark.t09.de
secretName: forgejo-net-tls

View file

@ -0,0 +1,91 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: forgejo-s3-backup
namespace: gitea
spec:
schedule: "0 1 * * *"
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
startingDeadlineSeconds: 600 # 10 minutes
jobTemplate:
spec:
# 2h window: handles large incremental syncs after repo growth or OBS slowness; BackupJobTooSlow alert fires at 5m
activeDeadlineSeconds: 7200
backoffLimit: 2
ttlSecondsAfterFinished: 259200 #
template:
spec:
containers:
- name: rclone
image: rclone/rclone:1.70
imagePullPolicy: IfNotPresent
env:
- name: SOURCE_BUCKET
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: bucket-name
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: secret-key
volumeMounts:
- name: rclone-config
mountPath: /config/rclone
readOnly: true
- name: backup-dir
mountPath: /backup
readOnly: false
command:
- /bin/sh
- -c
- |
rclone sync source:/${SOURCE_BUCKET} /backup -v --ignore-checksum
restartPolicy: OnFailure
volumes:
- name: rclone-config
secret:
secretName: forgejo-s3-backup
- name: backup-dir
persistentVolumeClaim:
claimName: s3-backup
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: s3-backup
namespace: gitea
annotations:
everest.io/disk-volume-type: GPSSD
everest.io/crypt-key-id: ac5a45e8-c705-445e-8026-e643e3f2525d
spec:
storageClassName: csi-disk
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
---
apiVersion: v1
kind: Secret
metadata:
name: forgejo-s3-backup
namespace: gitea
type: Opaque
stringData:
rclone.conf: |
[source]
type = s3
provider = HuaweiOBS
env_auth = true
endpoint = obs.eu-de.otc.t-systems.com
region = eu-de
acl = private

View file

@ -0,0 +1,180 @@
# We use recreate to make sure only one instance with one version is running, because Forgejo might break or data gets inconsistant.
strategy:
type: Recreate
redis-cluster:
enabled: false
redis:
enabled: false
postgresql:
enabled: false
postgresql-ha:
enabled: false
persistence:
enabled: true
size: 200Gi
storageClass: csi-disk
annotations:
everest.io/crypt-key-id: ac5a45e8-c705-445e-8026-e643e3f2525d
everest.io/disk-volume-type: GPSSD
test:
enabled: false
deployment:
env:
- name: SSL_CERT_DIR
value: /etc/ssl/forgejo
extraVolumeMounts:
- mountPath: /etc/ssl/forgejo
name: custom-database-certs-volume
readOnly: true
extraVolumes:
- name: custom-database-certs-volume
secret:
secretName: custom-database-certs
gitea:
metrics:
enabled: true
serviceMonitor:
enabled: true
additionalConfigFromEnvs:
- name: FORGEJO__storage__MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: access-key
- name: FORGEJO__storage__MINIO_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: secret-key
- name: FORGEJO__queue__CONN_STR
valueFrom:
secretKeyRef:
name: redis-forgejo-cloud-credentials
key: connection-string
- name: FORGEJO__session__PROVIDER_CONFIG
valueFrom:
secretKeyRef:
name: redis-forgejo-cloud-credentials
key: connection-string
- name: FORGEJO__cache__HOST
valueFrom:
secretKeyRef:
name: redis-forgejo-cloud-credentials
key: connection-string
- name: FORGEJO__database__HOST
valueFrom:
secretKeyRef:
name: postgres-forgejo-cloud-credentials
key: host_port
- name: FORGEJO__database__NAME
valueFrom:
secretKeyRef:
name: postgres-forgejo-cloud-credentials
key: database
- name: FORGEJO__database__USER
valueFrom:
secretKeyRef:
name: postgres-forgejo-cloud-credentials
key: username
- name: FORGEJO__database__PASSWD
valueFrom:
secretKeyRef:
name: postgres-forgejo-cloud-credentials
key: password
# Either 'elasticsearch' or 'bleve' (go in memory search engine)
- name: FORGEJO__indexer__ISSUE_INDEXER_TYPE
valueFrom:
secretKeyRef:
name: elasticsearch-cloud-credentials
key: type
- name: FORGEJO__indexer__ISSUE_INDEXER_CONN_STR
valueFrom:
secretKeyRef:
name: elasticsearch-cloud-credentials
key: connection-string
- name: FORGEJO__indexer__ISSUE_INDEXER_ENABLED
valueFrom:
secretKeyRef:
name: elasticsearch-cloud-credentials
key: enabled
- name: FORGEJO__mailer__PASSWD
valueFrom:
secretKeyRef:
name: email-user-credentials
key: connection-string
admin:
existingSecret: gitea-credential
config:
APP_NAME: 'EDP'
APP_SLOGAN: 'Build your thing in minutes'
storage:
MINIO_ENDPOINT: obs.eu-de.otc.t-systems.com:443
STORAGE_TYPE: minio
MINIO_LOCATION: eu-de
MINIO_BUCKET: "edp-forgejo-non-prod-benchmark"
MINIO_USE_SSL: true
queue:
TYPE: redis
session:
PROVIDER: redis
cache:
ENABLED: true
ADAPTER: redis
service:
DISABLE_REGISTRATION: true
ENABLE_NOTIFY_MAIL: true
other:
SHOW_FOOTER_VERSION: false
SHOW_FOOTER_TEMPLATE_LOAD_TIME: false
database:
DB_TYPE: postgres
SSL_MODE: verify-ca
server:
DOMAIN: 'benchmark.t09.de'
ROOT_URL: 'https://benchmark.t09.de:443'
mailer:
ENABLED: true
USER: ipcei-cis-devfw@mms-support.de
PROTOCOL: smtps
FROM: '"IPCEI CIS DevFW" <ipcei-cis-devfw@mms-support.de>'
SMTP_ADDR: mail.mms-support.de
SMTP_PORT: 465
service:
ssh:
type: LoadBalancer
nodePort: 32222
externalTrafficPolicy: Cluster
annotations:
kubernetes.io/elb.id: db60c1a9-312c-42b7-847b-781d950a0e7a
image:
pullPolicy: "IfNotPresent"
# Overrides the image tag whose default is the chart appVersion.
#tag: "8.0.3"
# Adds -rootless suffix to image name
# rootless: true
fullOverride: edp.buildth.ing/devfw-cicd/edp-forgejo:workflow-webhook-20260305
forgejo: {}

View file

@ -0,0 +1,33 @@
# Default: Forgejo/GitHub Actions runner manager
# Deploys GARM with the ci-sizer provider for automatic sizing + collector injection.
# For GitLab-only deployments, omit this and use gitlab-webhook instead.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: garm
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: garm
sources:
- repoURL: https://edp.buildth.ing/DevFW-CICD/garm-helm
path: charts/garm
targetRevision: v0.0.17
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/garm/garm/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,51 @@
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: main
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
hosts:
- host: garm.benchmark.t09.de
paths:
- path: /
pathType: Prefix
tls:
- secretName: garm-net-tls
hosts:
- garm.benchmark.t09.de
# Credentials and Secrets
credentials:
edgeConnect:
existingSecretName: "edge-credential"
gitea:
url: "https://benchmark.t09.de" # Required
db:
existingSecretName: garm-fixed-credentials
image:
repository: edp.buildth.ing/devfw-cicd/garm-forgejo
tag: v0.1.7-forgejo-22
providerConfig:
edgeConnect:
organization: edp2
region: EU
edgeConnectUrl: "https://hub.apps.edge.platform.mg3.mdb.osc.live"
cloudlet:
name: Hamburg
organization: TelekomOP
edgeConnectK8s:
pendingTimeout: "5m"
sizer:
sidecarImage: edp.buildth.ing/devfw-cicd/ci-sizer-collector:0.9.7
sidecarPushEndpoint: https://sizer.benchmark.t09.de/api/v1/metrics
baseUrl: "https://sizer.benchmark.t09.de"
readToken:
existingSecretName: sizer-tokens
# key/mountPath/fileName default sanely in garm-helm ≥v0.0.17
garm:
logging:
logLevel: info

View file

@ -0,0 +1,29 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: metrics-server
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: observability
sources:
- chart: metrics-server
repoURL: https://kubernetes-sigs.github.io/metrics-server/
targetRevision: 3.12.2
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/observability-client/metrics-server/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,4 @@
metrics:
enabled: true
serviceMonitor:
enabled: true

View file

@ -0,0 +1,29 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: vector
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: observability
sources:
- chart: vector
repoURL: https://helm.vector.dev
targetRevision: 0.43.0
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/observability-client/vector/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,68 @@
# -- Enable deployment of vector
role: Agent
dataDir: /vector-data-dir
resources: {}
args:
- -w
- --config-dir
- /etc/vector/
env:
- name: VECTOR_USER
valueFrom:
secretKeyRef:
name: simple-user-secret
key: username
- name: VECTOR_PASSWORD
valueFrom:
secretKeyRef:
name: simple-user-secret
key: password
containerPorts:
- name: prom-exporter
containerPort: 9090
protocol: TCP
service:
enabled: false
customConfig:
data_dir: /vector-data-dir
api:
enabled: false
address: 0.0.0.0:8686
playground: true
sources:
k8s:
type: kubernetes_logs
internal_metrics:
type: internal_metrics
transforms:
parser:
type: remap
inputs: [k8s]
source: |
._msg = parse_json(.message) ?? .message
del(.message)
# Add the cluster environment to the log event
.cluster_environment = "benchmark"
sinks:
vlogs:
type: elasticsearch
inputs: [parser]
endpoints:
- https://o12y.observability.buildth.ing/insert/elasticsearch/
auth:
strategy: basic
user: ${VECTOR_USER}
password: ${VECTOR_PASSWORD}
mode: bulk
api_version: v8
compression: gzip
healthcheck:
enabled: false
request:
headers:
AccountID: "0"
ProjectID: "0"
query:
_msg_field: _msg
_time_field: _time
_stream_fields: cluster_environment,kubernetes.container_name,kubernetes.namespace

View file

@ -0,0 +1,30 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: vm-client
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
destination:
name: in-cluster
namespace: observability
sources:
- chart: victoria-metrics-k8s-stack
repoURL: https://victoriametrics.github.io/helm-charts/
targetRevision: 0.48.1
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/observability-client/vm-client-stack/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/observability-client/vm-client-stack/manifests"

View file

@ -0,0 +1,15 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: forgejo
namespace: observability
spec:
namespaceSelector:
matchNames:
- gitea
selector:
matchLabels:
app.kubernetes.io/name: forgejo
endpoints:
- port: http
path: /metrics

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,25 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: grafana-operator
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
destination:
name: in-cluster
namespace: observability
sources:
- chart: grafana-operator
repoURL: ghcr.io/grafana/helm-charts
targetRevision: v5.18.0
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/observability/grafana-operator/manifests"

View file

@ -0,0 +1,9 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: argocd
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/argoproj/argo-cd/refs/heads/master/examples/dashboard.json"

View file

@ -0,0 +1,75 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
name: grafana
labels:
dashboards: "grafana"
spec:
persistentVolumeClaim:
metadata:
annotations:
everest.io/disk-volume-type: GPSSD
everest.io/crypt-key-id: ac5a45e8-c705-445e-8026-e643e3f2525d
spec:
storageClassName: csi-disk
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
deployment:
spec:
template:
spec:
containers:
- name: grafana
env:
- name: OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: clientSecret
name: dex-grafana-client
config:
log.console:
level: debug
server:
root_url: "https://grafana.benchmark.t09.de"
auth:
disable_login: "true"
disable_login_form: "true"
auth.generic_oauth:
enabled: "true"
name: Forgejo
allow_sign_up: "true"
use_refresh_token: "true"
client_id: grafana
client_secret: $__env{OAUTH_CLIENT_SECRET}
scopes: openid email profile offline_access groups
auth_url: https://dex.benchmark.t09.de/auth
token_url: https://dex.benchmark.t09.de/token
api_url: https://dex.benchmark.t09.de/userinfo
redirect_uri: https://grafana.benchmark.t09.de/login/generic_oauth
role_attribute_path: "contains(groups[*], 'DevFW') && 'GrafanaAdmin' || 'None'"
allow_assign_grafana_admin: "true"
ingress:
metadata:
annotations:
cert-manager.io/cluster-issuer: main
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
rules:
- host: grafana.benchmark.t09.de
http:
paths:
- backend:
service:
name: grafana-service
port:
number: 3000
path: /
pathType: Prefix
tls:
- hosts:
- grafana.benchmark.t09.de
secretName: grafana-net-tls

View file

@ -0,0 +1,9 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: ingress-nginx
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/adinhodovic/ingress-nginx-mixin/refs/heads/main/dashboards_out/ingress-nginx-overview.json"

View file

@ -0,0 +1,9 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: victoria-logs
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/VictoriaMetrics/VictoriaMetrics/refs/heads/master/dashboards/vm/victorialogs.json"

View file

@ -0,0 +1,31 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: o12y
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
destination:
name: in-cluster
namespace: observability
sources:
- chart: victoria-metrics-k8s-stack
repoURL: https://victoriametrics.github.io/helm-charts/
targetRevision: 0.48.1
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/observability/victoria-k8s-stack/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/observability/victoria-k8s-stack/manifests"

View file

@ -0,0 +1,40 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: forgejo-alerts
namespace: observability
spec:
groups:
- name: forgejo
rules:
- alert: forgejo down
expr: sum by(cluster_environment) (up{pod=~"forgejo-server-.*"}) < 1
for: 30s
labels:
severity: critical
job: "{{ $labels.job }}"
annotations:
value: "{{ $value }}"
description: 'forgejo is down in cluster environment {{ $labels.cluster_environment }}'
- name: forgejo-backup
rules:
- alert: forgejo s3 backup job failed
expr: max by(cluster_environment) (kube_job_status_failed{job_name=~"forgejo-s3-backup-.*"}) != 0
for: 30s
labels:
severity: critical
job: "{{ $labels.job }}"
annotations:
value: "{{ $value }}"
description: 'forgejo s3 backup job failed in cluster environment {{ $labels.cluster_environment }}'
- name: disk-consumption-high
rules:
- alert: disk consumption high
expr: 1-(kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) > 0.6
for: 30s
labels:
severity: major
job: "{{ $labels.job }}"
annotations:
value: "{{ $value }}"
description: 'disk consumption of pvc {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is high in cluster environment {{ $labels.cluster_environment }}'

View file

@ -0,0 +1,26 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VLogs
metadata:
name: victorialogs
namespace: observability
spec:
retentionPeriod: "12"
removePvcAfterDelete: true
storageMetadata:
annotations:
everest.io/crypt-key-id: ac5a45e8-c705-445e-8026-e643e3f2525d
everest.io/disk-volume-type: GPSSD
storage:
storageClassName: csi-disk
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
resources:
requests:
memory: 500Mi
cpu: 500m
limits:
memory: 10Gi
cpu: 2

View file

@ -0,0 +1,17 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMUser
metadata:
name: simple-user
namespace: observability
spec:
username: simple-user
passwordRef:
key: password
name: simple-user-secret
targetRefs:
- static:
url: http://vmsingle-o12y:8429
paths: ["/api/v1/write"]
- static:
url: http://vlogs-victorialogs:9428
paths: ["/insert/elasticsearch/.*"]

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,14 @@
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: main
spec:
acme:
email: admin@think-ahead.tech
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: cluster-issuer-account-key
solvers:
- http01:
ingress:
ingressClassName: nginx

View file

@ -0,0 +1,4 @@
crds:
enabled: true
replicaCount: 1

View file

@ -0,0 +1,32 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: cert-manager
sources:
- chart: cert-manager
repoURL: https://charts.jetstack.io
targetRevision: v1.17.2
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/otc/cert-manager/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/otc/cert-manager/manifests"

View file

@ -0,0 +1,29 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ingress-nginx
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ingress-nginx
sources:
- repoURL: https://github.com/kubernetes/ingress-nginx.git
path: charts/ingress-nginx
targetRevision: helm-chart-4.12.1
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/otc/ingress-nginx/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,31 @@
controller:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
service:
annotations:
kubernetes.io/elb.class: union
kubernetes.io/elb.port: '80'
kubernetes.io/elb.id: db60c1a9-312c-42b7-847b-781d950a0e7a
kubernetes.io/elb.ip: 164.30.20.78
ingressClassResource:
name: nginx
# added for idpbuilder
allowSnippetAnnotations: true
# added for idpbuilder
config:
proxy-buffer-size: 32k
use-forwarded-headers: "true"
# monitoring nginx
metrics:
enabled: true
serviceMonitor:
additionalLabels:
release: "ingress-nginx"
enabled: true

View file

@ -0,0 +1,25 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: storageclass
namespace: argocd
labels:
example: otc
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
namespace: default
server: "https://kubernetes.default.svc"
source:
repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/benchmark.t09.de/stacks/otc/storageclass"
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1

View file

@ -0,0 +1,18 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
labels:
kubernetes.io/cluster-service: "true"
name: default
parameters:
kubernetes.io/description: ""
kubernetes.io/hw:passthrough: "true"
kubernetes.io/storagetype: BS
kubernetes.io/volumetype: SATA
kubernetes.io/zone: eu-de-02
provisioner: flexvolume-huawei.com/fuxivol
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

View file

@ -0,0 +1,30 @@
# helm upgrade --install --create-namespace --namespace terralist terralist oci://ghcr.io/terralist/helm-charts/terralist -f terralist-values.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: terralist
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: terralist
sources:
- repoURL: https://github.com/terralist/helm-charts
path: charts/terralist
targetRevision: terralist-0.8.1
helm:
valueFiles:
- $values/otc/benchmark.t09.de/stacks/terralist/terralist/values.yaml
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,87 @@
controllers:
main:
strategy: Recreate
containers:
app:
env:
- name: TERRALIST_OAUTH_PROVIDER
value: oidc
- name: TERRALIST_OI_CLIENT_ID
valueFrom:
secretKeyRef:
name: oidc-credentials
key: client-id
- name: TERRALIST_OI_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: oidc-credentials
key: client-secret
- name: TERRALIST_OI_AUTHORIZE_URL
valueFrom:
secretKeyRef:
name: oidc-credentials
key: authorize-url
- name: TERRALIST_OI_TOKEN_URL
valueFrom:
secretKeyRef:
name: oidc-credentials
key: token-url
- name: TERRALIST_OI_USERINFO_URL
valueFrom:
secretKeyRef:
name: oidc-credentials
key: userinfo-url
- name: TERRALIST_OI_SCOPE
valueFrom:
secretKeyRef:
name: oidc-credentials
key: scope
- name: TERRALIST_TOKEN_SIGNING_SECRET
valueFrom:
secretKeyRef:
name: terralist-secret
key: token-signing-secret
- name: TERRALIST_COOKIE_SECRET
valueFrom:
secretKeyRef:
name: terralist-secret
key: cookie-secret
- name: TERRALIST_URL
value: https://terralist.benchmark.t09.de
- name: TERRALIST_SQLITE_PATH
value: /data/db.sqlite
- name: TERRALIST_LOCAL_STORE
value: /data/modules
- name: TERRALIST_PROVIDERS_ANONYMOUS_READ
value: "true"
ingress:
main:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: main
hosts:
- host: terralist.benchmark.t09.de
paths:
- path: /
pathType: Prefix
service:
identifier: main
port: http
tls:
- hosts:
- terralist.benchmark.t09.de
secretName: terralist-tls-secret
persistence:
data:
enabled: true
accessMode: ReadWriteOnce
size: 10Gi
retain: false
storageClass: "csi-disk"
annotations:
everest.io/disk-volume-type: GPSSD
globalMounts:
- path: /data

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ci-sizer-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "otc/dev.t09.de/stacks/ci-sizer"
repoURL: "https://edp.buildth.ing/DevFW-CICD/stacks-instances"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,29 @@
# Optional: GitLab CI integration
# Only hydrate this app for clusters that run GitLab Runner.
# For Forgejo/GitHub-only deployments, omit this app from stacks-instances.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: gitlab-sizer-webhook
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ci-sizer
source:
repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/dev.t09.de/stacks/ci-sizer/gitlab-webhook"

View file

@ -0,0 +1,27 @@
# Self-signed Issuer for webhook TLS.
# For production, replace with a ClusterIssuer backed by a real CA.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
# cert-manager Certificate for the webhook TLS.
# The resulting Secret (gitlab-sizer-webhook-tls) is mounted into the webhook pod.
# cert-manager also injects the CA into the MutatingWebhookConfiguration via the
# cert-manager.io/inject-ca-from annotation.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: gitlab-sizer-webhook-cert
spec:
secretName: gitlab-sizer-webhook-tls
issuerRef:
name: selfsigned-issuer
kind: Issuer
dnsNames:
- gitlab-sizer-webhook.ci-sizer.svc
- gitlab-sizer-webhook.ci-sizer.svc.cluster.local
duration: 8760h
renewBefore: 720h

View file

@ -0,0 +1,141 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-sizer-webhook
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gitlab-sizer-webhook
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitlab-sizer-webhook
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gitlab-sizer-webhook
subjects:
- kind: ServiceAccount
name: gitlab-sizer-webhook
namespace: ci-sizer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-sizer-webhook
labels:
app: gitlab-sizer-webhook
spec:
replicas: 2
selector:
matchLabels:
app: gitlab-sizer-webhook
template:
metadata:
labels:
app: gitlab-sizer-webhook
spec:
serviceAccountName: gitlab-sizer-webhook
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: webhook
image: edp.buildth.ing/devfw-cicd/gitlab-webhook-edge-connect:latest
imagePullPolicy: Always
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
ports:
- containerPort: 8443
protocol: TCP
args:
- --listen-addr=:8443
- --tls-cert-file=/etc/webhook/tls/tls.crt
- --tls-key-file=/etc/webhook/tls/tls.key
- --sizer-url=http://sizer-receiver.ci-sizer.svc:8080
- --sizer-sidecar-image=edp.buildth.ing/devfw-cicd/ci-sizer-collector:latest
env:
- name: WEBHOOK_SIZER_READ_TOKEN
valueFrom:
secretKeyRef:
name: gitlab-sizer-webhook-tokens
key: sizer-read-token
- name: WEBHOOK_SIZER_PUSH_TOKEN
valueFrom:
secretKeyRef:
name: gitlab-sizer-webhook-tokens
key: sizer-push-token
- name: HTTP_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: HTTP_PROXY
optional: true
- name: HTTPS_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: HTTPS_PROXY
optional: true
- name: NO_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: NO_PROXY
optional: true
volumeMounts:
- name: webhook-tls
mountPath: /etc/webhook/tls
readOnly: true
livenessProbe:
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 128Mi
volumes:
- name: webhook-tls
secret:
secretName: gitlab-sizer-webhook-tls
---
apiVersion: v1
kind: Service
metadata:
name: gitlab-sizer-webhook
labels:
app: gitlab-sizer-webhook
spec:
selector:
app: gitlab-sizer-webhook
ports:
- port: 443
targetPort: 8443
protocol: TCP

View file

@ -0,0 +1,30 @@
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: gitlab-sizer-webhook
annotations:
cert-manager.io/inject-ca-from: ci-sizer/gitlab-sizer-webhook-cert
webhooks:
- name: gitlab-sizer-webhook.ci-sizer.svc
admissionReviewVersions: ["v1"]
sideEffects: NoneOnDryRun
failurePolicy: Ignore
timeoutSeconds: 5
reinvocationPolicy: Never
clientConfig:
service:
name: gitlab-sizer-webhook
namespace: ci-sizer
path: /mutate
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods"]
namespaceSelector:
matchLabels:
ci-sizer.devfw.io/watch: "true"
objectSelector:
matchExpressions:
- key: job.runner.gitlab.com/pod
operator: Exists

View file

@ -0,0 +1,29 @@
# Required: CI Sizer receiver
# Always deploy this — it stores metrics and computes sizing recommendations.
# Works standalone or with GARM (Forgejo/GitHub) and/or GitLab webhook.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: sizer-receiver
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ci-sizer
source:
repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/dev.t09.de/stacks/ci-sizer/sizer-receiver"

View file

@ -1,22 +1,27 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: optimiser-receiver
name: sizer-receiver
labels:
app: optimiser-receiver
app: sizer-receiver
spec:
strategy:
type: Recreate
replicas: 1
selector:
matchLabels:
app: optimiser-receiver
app: sizer-receiver
template:
metadata:
labels:
app: optimiser-receiver
app: sizer-receiver
spec:
securityContext:
fsGroup: 65534
containers:
- name: receiver
image: edp.buildth.ing/devfw-cicd/forgejo-runner-optimiser-receiver:0.0.3
image: edp.buildth.ing/devfw-cicd/ci-sizer-receiver:latest
imagePullPolicy: Always
args:
- --db=/data/metrics.db
ports:
@ -27,13 +32,41 @@ spec:
- name: RECEIVER_READ_TOKEN
valueFrom:
secretKeyRef:
name: optimiser-tokens
name: sizer-tokens
key: read-token
- name: RECEIVER_HMAC_KEY
valueFrom:
secretKeyRef:
name: optimiser-tokens
name: sizer-tokens
key: hmac-key
- name: GARM_URL
value: "http://garm.garm.svc:80"
- name: GARM_USER
value: "admin"
- name: GARM_PASSWORD
valueFrom:
secretKeyRef:
name: garm-fixed-credentials
key: admin_password
- name: RECEIVER_OIDC_ISSUER
value: "https://dex.dev.t09.de"
- name: RECEIVER_OIDC_CLIENT_ID
value: "ci-sizer"
- name: RECEIVER_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: sizer-oidc-client
key: client-secret
- name: RECEIVER_OIDC_REDIRECT_URI
value: "https://sizer.dev.t09.de/ui/callback"
- name: RECEIVER_SESSION_TTL
value: "12h"
- name: RECEIVER_ALLOWED_ORG
value: "DevFW-CICD"
- name: RECEIVER_CPU_SIZING_MODE
value: "observe"
- name: RECEIVER_MEMORY_QOS
value: "guaranteed"
volumeMounts:
- name: data
mountPath: /data
@ -59,17 +92,17 @@ spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: optimiser-receiver-data
claimName: sizer-receiver-data
---
apiVersion: v1
kind: Service
metadata:
name: optimiser-receiver
name: sizer-receiver
labels:
app: optimiser-receiver
app: sizer-receiver
spec:
selector:
app: optimiser-receiver
app: sizer-receiver
ports:
- name: http
port: 8080
@ -79,9 +112,9 @@ spec:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: optimiser-receiver-data
name: sizer-receiver-data
labels:
app: optimiser-receiver
app: sizer-receiver
annotations:
everest.io/disk-volume-type: GPSSD
spec:

View file

@ -0,0 +1,36 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
cert-manager.io/cluster-issuer: main
name: sizer-receiver
namespace: ci-sizer
spec:
ingressClassName: nginx
rules:
- host: sizer.dev.t09.de
http:
paths:
- backend:
service:
name: sizer-receiver
port:
number: 8080
path: /
pathType: Prefix
- host: ci-sizer.dev.t09.de
http:
paths:
- backend:
service:
name: sizer-receiver
port:
number: 8080
path: /
pathType: Prefix
tls:
- hosts:
- sizer.dev.t09.de
secretName: sizer-receiver-tls

View file

@ -0,0 +1,9 @@
apiVersion: v1
kind: Secret
metadata:
name: sizer-oidc-client
labels:
app: sizer-receiver
type: Opaque
stringData:
client-secret: "73eda9068bd00dfe67d29f087b5540cb1cd82cc1dd2ac0f838558ac8bbcfcb3a"

View file

@ -35,6 +35,30 @@ configs:
tls:
certificates:
controller:
metrics:
enabled: true
serviceMonitor:
enabled: false
server:
metrics:
enabled: true
serviceMonitor:
enabled: false
repoServer:
metrics:
enabled: true
serviceMonitor:
enabled: false
applicationSet:
metrics:
enabled: true
serviceMonitor:
enabled: false
notifications:
enabled: false

View file

@ -27,3 +27,6 @@ spec:
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
ref: values
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/dev.t09.de/stacks/core/dex/manifests"

View file

@ -0,0 +1,8 @@
apiVersion: v1
kind: Secret
metadata:
name: dex-sizer-client
namespace: dex
type: Opaque
stringData:
clientSecret: "73eda9068bd00dfe67d29f087b5540cb1cd82cc1dd2ac0f838558ac8bbcfcb3a"

View file

@ -34,6 +34,11 @@ envVars:
secretKeyRef:
name: dex-argo-client
key: clientSecret
- name: FORGEJO_RUNNER_SIZER_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-sizer-client
key: clientSecret
- name: LOG_LEVEL
value: debug
@ -74,3 +79,8 @@ config:
- "https://grafana.dev.t09.de/login/generic_oauth"
name: "Grafana"
secretEnv: "OIDC_DEX_GRAFANA_CLIENT_SECRET"
- id: ci-sizer
name: "CI Sizer"
redirectURIs:
- "https://sizer.dev.t09.de/ui/callback"
secretEnv: "FORGEJO_RUNNER_SIZER_CLIENT_SECRET"

View file

@ -0,0 +1,23 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: secrets-backup
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: gitea
sources:
- repoURL: https://edp.buildth.ing/DevFW-CICD/stacks-instances
targetRevision: HEAD
path: "otc/dev.t09.de/stacks/core/secrets-backup/manifests"

View file

@ -0,0 +1,107 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: secrets-backup
namespace: gitea
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: secrets-backup-reader
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: secrets-backup-reader
subjects:
- kind: ServiceAccount
name: secrets-backup
namespace: gitea
roleRef:
kind: ClusterRole
name: secrets-backup-reader
apiGroup: rbac.authorization.k8s.io
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: secrets-backup
namespace: gitea
spec:
schedule: "30 3 * * *"
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
startingDeadlineSeconds: 600 # 10 minutes
jobTemplate:
spec:
activeDeadlineSeconds: 900
backoffLimit: 2
ttlSecondsAfterFinished: 259200
template:
spec:
serviceAccountName: secrets-backup
containers:
- name: secrets-backup
image: edp.buildth.ing/devfw-cicd/secrets-backup:1.0.1
imagePullPolicy: IfNotPresent
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: secret-key
- name: SOURCE_BUCKET
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: bucket-name
- name: OBS_ENDPOINT
value: "obs.eu-de.otc.t-systems.com"
command:
- /bin/sh
- -c
- |
set -euo pipefail
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/tmp/secrets-backup-${TIMESTAMP}"
NAMESPACES="argocd cert-manager external-secrets"
mkdir -p "${BACKUP_DIR}"
echo "=== Exporting secrets from critical namespaces ==="
for NS in ${NAMESPACES}; do
echo "Exporting namespace: ${NS}"
kubectl get secrets -n "${NS}" \
-o json \
--field-selector type!=kubernetes.io/service-account-token \
> "${BACKUP_DIR}/${NS}-secrets.json"
done
echo "=== Creating compressed archive ==="
ARCHIVE="${BACKUP_DIR}/secrets-backup-${TIMESTAMP}.tar.gz"
tar -czf "${ARCHIVE}" -C "${BACKUP_DIR}" \
$(ls "${BACKUP_DIR}"/*.json 2>/dev/null | xargs -n1 basename)
echo "=== Uploading to OBS (SSE-KMS encryption at rest) ==="
aws s3 cp "${ARCHIVE}" \
"s3://${SOURCE_BUCKET}/cluster-secrets-backup/${TIMESTAMP}/secrets-backup.tar.gz" \
--endpoint-url "https://${OBS_ENDPOINT}"
echo "=== Cleanup ==="
rm -rf "${BACKUP_DIR}"
echo "Backup completed: ${TIMESTAMP}"
restartPolicy: OnFailure

View file

@ -7,7 +7,7 @@ metadata:
namespace: gitea
spec:
# Two replicas means that if one is busy, the other can pick up jobs.
replicas: 0
replicas: 3
selector:
matchLabels:
app: forgejo-runner

View file

@ -3,7 +3,7 @@ kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: 512m
nginx.ingress.kubernetes.io/proxy-body-size: 5120m
cert-manager.io/cluster-issuer: main
name: forgejo-server

View file

@ -11,8 +11,8 @@ spec:
startingDeadlineSeconds: 600 # 10 minutes
jobTemplate:
spec:
# 60 min until backup - 10 min start - (backoffLimit * activeDeadlineSeconds) - some time sync buffer
activeDeadlineSeconds: 1350
# 2h window: handles large incremental syncs after repo growth or OBS slowness; BackupJobTooSlow alert fires at 5m
activeDeadlineSeconds: 7200
backoffLimit: 2
ttlSecondsAfterFinished: 259200 #
template:
@ -72,7 +72,7 @@ spec:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storage: 500Gi
---
apiVersion: v1
kind: Secret

View file

@ -137,6 +137,9 @@ gitea:
ENABLED: true
ADAPTER: redis
security:
GLOBAL_TWO_FACTOR_REQUIREMENT: admin
service:
DISABLE_REGISTRATION: true
ENABLE_NOTIFY_MAIL: true
@ -171,10 +174,9 @@ service:
image:
pullPolicy: "IfNotPresent"
# Overrides the image tag whose default is the chart appVersion.
#tag: "8.0.3"
# Adds -rootless suffix to image name
# rootless: true
# DB has v15a/v15b migrations from workflow-webhook build.
# Using that image until a proper v15+ EDP release is cut.
# DO NOT revert — automated upload will break the DB schema.
fullOverride: edp.buildth.ing/devfw-cicd/edp-forgejo:workflow-webhook-20260305
forgejo: {}

View file

@ -1,3 +1,7 @@
# Default: Forgejo/GitHub Actions runner manager
# Deploys GARM with the ci-sizer provider for automatic sizing + collector injection.
# For GitLab-only deployments, omit this and use gitlab-webhook instead.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
@ -20,7 +24,7 @@ spec:
sources:
- repoURL: https://edp.buildth.ing/DevFW-CICD/garm-helm
path: charts/garm
targetRevision: v0.0.12
targetRevision: v0.0.16
helm:
valueFiles:
- $values/otc/dev.t09.de/stacks/garm/garm/values.yaml

View file

@ -26,7 +26,7 @@ credentials:
image:
repository: edp.buildth.ing/devfw-cicd/garm-forgejo
tag: v0.1.7-forgejo-3
tag: v0.1.7-forgejo-24
providerConfig:
edgeConnect:
@ -38,12 +38,11 @@ providerConfig:
organization: TelekomOP
edgeConnectK8s:
sizer:
sidecarImage: edp.buildth.ing/devfw-cicd/forgejo-runner-sizer-collector:latest
sidecarPushEndpoint: https://sizer.dev.t09.de/api/v1/metrics
baseUrl: "https://sizer.dev.t09.de"
readToken:
existingSecretName: sizer-tokens
sidecarImage: edp.buildth.ing/devfw-cicd/ci-sizer-collector:0.0.4
garm:
metrics:
enable: true
disableAuth: true
logging:
logLevel: debug
logLevel: info

View file

@ -48,7 +48,7 @@ customConfig:
type: elasticsearch
inputs: [parser]
endpoints:
- https://o12y.observability./insert/elasticsearch/
- https://o12y.observability.buildth.ing/insert/elasticsearch/
auth:
strategy: basic
user: ${VECTOR_USER}

View file

@ -0,0 +1,14 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: argocd
namespace: observability
spec:
namespaceSelector:
matchNames:
- argocd
selector:
matchLabels:
app.kubernetes.io/part-of: argocd
endpoints:
- port: http-metrics

View file

@ -0,0 +1,15 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: forgejo
namespace: observability
spec:
namespaceSelector:
matchNames:
- gitea
selector:
matchLabels:
app.kubernetes.io/name: forgejo
endpoints:
- port: http
path: /metrics

View file

@ -0,0 +1,15 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: garm
namespace: observability
spec:
namespaceSelector:
matchNames:
- garm
selector:
matchLabels:
app.kubernetes.io/name: garm
endpoints:
- port: http
path: /metrics

View file

@ -778,7 +778,7 @@ vmagent:
# -- Remote write configuration of VMAgent, allowed parameters defined in a [spec](https://docs.victoriametrics.com/operator/api#vmagentremotewritespec)
additionalRemoteWrites:
# []
- url: https://o12y.observability./api/v1/write
- url: https://o12y.observability.buildth.ing/api/v1/write
basicAuth:
username:
name: simple-user-secret

View file

@ -35,8 +35,10 @@ spec:
server:
root_url: "https://grafana.dev.t09.de"
auth:
disable_login: "true"
disable_login_form: "true"
security:
admin_user: admin
admin_password: admin
auth.generic_oauth:
enabled: "true"
name: Forgejo

View file

@ -9,10 +9,13 @@ spec:
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- RespectIgnoreDifferences=true
- SkipDryRunOnMissingResource=true
destination:
name: in-cluster
namespace: observability

View file

@ -0,0 +1,14 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: argocd
namespace: observability
spec:
namespaceSelector:
matchNames:
- argocd
selector:
matchLabels:
app.kubernetes.io/part-of: argocd
endpoints:
- port: http-metrics

View file

@ -0,0 +1,78 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: backup-alerts
namespace: observability
spec:
groups:
- name: backup-schedule-staleness
rules:
- alert: BackupCronJobNotScheduled
expr: |
time() - kube_cronjob_status_last_schedule_time{cronjob=~"forgejo-s3-backup|secrets-backup", namespace="gitea"}
> 26 * 3600
for: 5m
labels:
severity: critical
cronjob: "{{ $labels.cronjob }}"
annotations:
value: "{{ $value | humanizeDuration }}"
description: >-
CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} has not been
scheduled for over 26 hours in cluster {{ $labels.cluster_environment }}.
Last schedule was {{ $value | humanizeDuration }} ago.
summary: "Backup CronJob {{ $labels.cronjob }} is stale"
- alert: BackupCronJobNeverScheduled
expr: |
kube_cronjob_status_last_schedule_time{cronjob=~"forgejo-s3-backup|secrets-backup", namespace="gitea"}
== 0
for: 30m
labels:
severity: critical
cronjob: "{{ $labels.cronjob }}"
annotations:
description: >-
CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} has never been
scheduled in cluster {{ $labels.cluster_environment }}.
summary: "Backup CronJob {{ $labels.cronjob }} never ran"
- name: backup-job-failures
rules:
- alert: BackupJobFailed
expr: |
max by(cluster_environment, namespace, job_name) (
kube_job_status_failed{job_name=~"forgejo-s3-backup-.*|secrets-backup-.*", namespace="gitea"}
) > 0
for: 30s
labels:
severity: critical
job_name: "{{ $labels.job_name }}"
annotations:
value: "{{ $value }}"
description: >-
Backup job {{ $labels.namespace }}/{{ $labels.job_name }} has
{{ $value }} failed pod(s) in cluster {{ $labels.cluster_environment }}.
summary: "Backup job {{ $labels.job_name }} failed"
- name: backup-job-duration
rules:
- alert: BackupJobTooSlow
expr: |
(
time() - kube_job_status_start_time{job_name=~"forgejo-s3-backup-.*|secrets-backup-.*", namespace="gitea"}
) > 300
and
kube_job_status_active{job_name=~"forgejo-s3-backup-.*|secrets-backup-.*", namespace="gitea"} > 0
for: 1m
labels:
severity: major
job_name: "{{ $labels.job_name }}"
annotations:
value: "{{ $value | humanizeDuration }}"
description: >-
Backup job {{ $labels.namespace }}/{{ $labels.job_name }} has been
running for {{ $value | humanizeDuration }} (threshold: 5m)
in cluster {{ $labels.cluster_environment }}. This may indicate a
hung process or connectivity issue.
summary: "Backup job {{ $labels.job_name }} running too long"

View file

@ -0,0 +1,14 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: coredns
namespace: observability
spec:
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-dns
endpoints:
- port: metrics

View file

@ -0,0 +1,15 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: forgejo
namespace: observability
spec:
namespaceSelector:
matchNames:
- gitea
selector:
matchLabels:
app.kubernetes.io/name: forgejo
endpoints:
- port: http
path: /metrics

View file

@ -0,0 +1,14 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: garm
namespace: observability
spec:
namespaceSelector:
matchNames:
- garm
selector:
matchLabels:
app.kubernetes.io/name: garm
endpoints:
- port: http

View file

@ -12,6 +12,12 @@ spec:
- static:
url: http://vmsingle-o12y:8429
paths: ["/api/v1/write"]
- static:
url: http://vmsingle-o12y:8429
paths: ["/api/v1/.*"]
- static:
url: http://vlogs-victorialogs:9428
paths: ["/insert/elasticsearch/.*"]
- static:
url: http://vlogs-victorialogs:9428
paths: ["/select/.*"]

View file

@ -28,10 +28,7 @@ victoria-metrics-operator:
crds:
plain: true
cleanup:
enabled: true
image:
repository: bitnami/kubectl
pullPolicy: IfNotPresent
enabled: false # disabled: cleanup hook can't schedule on resource-constrained nodes (Insufficient cpu / Too many pods)
serviceMonitor:
enabled: true
operator:
@ -676,7 +673,7 @@ vmalert:
vmauth:
# -- Enable VMAuth CR
enabled: true
enabled: false
# -- VMAuth annotations
annotations: {}
# -- (object) Full spec for VMAuth CRD. Allowed values described [here](https://docs.victoriametrics.com/operator/api#vmauthspec)
@ -699,7 +696,7 @@ vmauth:
vmagent:
# -- Create VMAgent CR
enabled: false
enabled: true
# -- VMAgent annotations
annotations: {}
# -- Remote write configuration of VMAgent, allowed parameters defined in a [spec](https://docs.victoriametrics.com/operator/api#vmagentremotewritespec)
@ -711,7 +708,8 @@ vmagent:
port: "8429"
selectAllByDefault: true
scrapeInterval: 20s
externalLabels: {}
externalLabels:
cluster_environment: "dev"
# For multi-cluster setups it is useful to use "cluster" label to identify the metrics source.
# For example:
# cluster: cluster-name

View file

@ -35,6 +35,30 @@ configs:
tls:
certificates:
controller:
metrics:
enabled: true
serviceMonitor:
enabled: false
server:
metrics:
enabled: true
serviceMonitor:
enabled: false
repoServer:
metrics:
enabled: true
serviceMonitor:
enabled: false
applicationSet:
metrics:
enabled: true
serviceMonitor:
enabled: false
notifications:
enabled: false

Some files were not shown because too many files have changed in this diff Show more