Compare commits

..

108 commits

Author SHA1 Message Date
0fa46dbf16
feat(observability): add cluster heartbeat dead-man switch alerts
ClusterMetricsSilent: fires if no kubelet metrics for >10m (catches vmagent outages).
ClusterAPIServerDown: fires if apiserver scrape fails for >5m.
Replaces silenced KubeControllerManagerDown/KubeSchedulerDown which never fire on managed K8s.
2026-06-22 11:06:06 +02:00
c2b1c18ad1
fix(observability): 🔇 silence managed-K8s false alerts + bump backup deadline to 4h
- Disable kubernetesSystemControllerManager, kubeScheduler, kubernetesSystemScheduler
  alert rules in template (unreachable on managed K8s)
- Bump forgejo s3 backup activeDeadlineSeconds 7200→14400 (2h→4h) in template;
  deadline hit Jun 20-21 on heavy sync
2026-06-22 10:46:03 +02:00
ab220fd0e0
fix(observability): 🐛 harden vmagent liveness probe failureThreshold 10→3
Silent outage for 72h went undetected due to lenient probe.
Add startupProbe (failureThreshold=30) to allow slow starts.
2026-06-22 10:40:50 +02:00
d59ddd80b6
fix(observability): 🐛 use cluster_environment as global clusterLabel for default dashboards
Default Victoria Metrics k8s dashboards were filtering on 'cluster' label
which only contained 'observability'. Our metrics use 'cluster_environment'
label which contains the actual cluster values: dev, edp, observability.
2026-06-22 10:35:09 +02:00
726ea45f29
feat(observability): comprehensive platform alert rules
Replace ad-hoc forgejo/disk alerts with structured VMRule covering:
- platform-health: ForgejoDown, IngressHighErrorRate, NodeNotReady, PodCrashLooping
- storage: PVCUsageHigh (>80%), PVCUsageCritical (>90%)
- resources: NodeCPUHigh (>85%), NodeMemoryHigh (>90%)
2026-06-19 16:43:44 +02:00
5b4ea9c5e8
feat(observability): add read routes to vmauth for metrics and logs queries 2026-06-19 16:33:25 +02:00
c7b0c6825c
fix(observability): 🐛 fix ArgoCD scrape port name http-metrics not metrics 2026-06-19 16:11:15 +02:00
fa0fa4fc8f
fix(observability): 🐛 add ArgoCD + GARM VMServiceScrapes to client stack template 2026-06-19 16:07:27 +02:00
Martin McCaffery
2aa6f0c9ce
garm: bump chart v0.0.16→v0.0.17 + ci-sizer-collector sidecar 0.0.4→0.9.7 (sync with benchmark.t09.de canonical state) 2026-06-19 15:10:03 +02:00
86495f292f
feat(observability): 🗂️ organize dashboards into Grafana folders
Assigns folder field to all GrafanaDashboard CRs:
- EDP / Overview: platform-overview
- EDP / Applications: forgejo, argocd-operational, garm, argocd
- EDP / Operations: cronjob-monitoring, ingress-nginx, victoria-logs
2026-06-19 14:46:53 +02:00
9a28a6e9b8
feat(observability): add VictoriaLogs log panels to platform, forgejo, argocd dashboards 2026-06-19 13:34:29 +02:00
9850fb8696
fix(observability): 🐛 fix datasource UIDs, replace cronjob dashboard, add GARM
- Remove all ${DS_VICTORIAMETRICS} uid refs from platform-overview
- Replace grafanaCom id:14279 cronjob dashboard with inline custom version
- Add new GARM runners dashboard (edp-garm)
2026-06-19 13:12:07 +02:00
4680b465fa
feat(observability): custom ArgoCD dashboard with cluster_environment filter 2026-06-19 13:02:49 +02:00
3bc8a7444b
feat(observability): add dashboards, scrape configs, and fix victoria-logs to template
Add new Grafana dashboard CRs to grafana-operator/manifests:
- platform-overview, forgejo, argocd-operational, cronjob-monitoring

Fix victoria-logs dashboard to use grafana.com marketplace (id: 22698)
instead of raw GitHub URL

Add hub-side scrape configs to victoria-k8s-stack/manifests:
- argocd-scrape, garm-scrape, coredns-scrape, ci-sustainability-rules

Add client-side forgejo VMServiceScrape to observability-client/vm-client-stack/manifests

Enable ArgoCD metrics endpoints in core/argocd/values.yaml (required by argocd-scrape)
2026-06-19 12:58:26 +02:00
69e4d1b3dc
fix(forgejo): ⏱️ increase s3-backup activeDeadlineSeconds 1350→7200
Previous 22.5m deadline caused DeadlineExceeded on 2026-06-19 when
rclone sync took >22m (vs 13-16s prior days). Likely triggered by
significant new data in OBS bucket. 2h window accommodates large
incremental syncs while BackupJobTooSlow alert still fires at 5m.
2026-06-19 12:35:40 +02:00
834baf1a55
feat(otc): 🛡️ default StorageClass reclaimPolicy to Retain
Prevents accidental data loss on PVC deletion. Volumes persist even
when PVC is removed. Ephemeral environments can override via
STORAGE_RECLAIM_POLICY=Delete env var.

Ref: IPCEICIS-2810
2026-06-15 11:48:27 +02:00
81b721bb5a
fix(secrets-backup): 🔥 remove client-side openssl encryption
Some checks failed
Build secrets-backup image / build-and-push (push) Failing after 3s
OBS bucket has server-side KMS encryption. Client-side openssl was
redundant and caused failures (Alpine CDN unreachable at 03:30 UTC).

Changes:
- Dockerfile: remove openssl apk install (no longer needed)
- CronJob: remove openssl enc step, upload .tar.gz directly
- CronJob: remove secrets-backup-config Secret (encryption passphrase)
- CronJob: remove ENCRYPTION_PASSPHRASE env var
- Bump image tag to 1.0.1, update workflow and manifest reference

Flow: kubectl export → tar.gz → upload to OBS (SSE-KMS handles rest)

Ref: IPCEICIS-9317
2026-06-12 13:02:11 +02:00
6b29aa3916
fix(garm): ⬆️ bump image tag to v0.1.7-forgejo-24
v0.1.7-forgejo-23 had exec format error on amd64 nodes.
-24 is a rebuild from the same commit to produce correct multi-arch manifests.
2026-06-12 11:06:29 +02:00
5f4032bea6
fix(garm): 📌 pin to v0.1.7-forgejo-22 — -23 has wrong arch
v0.1.7-forgejo-23 produces exec format error on amd64 nodes.
Permanent fix until -24 is built correctly.
2026-06-12 10:23:51 +02:00
1f6e91b6ac
fix(secrets-backup): 🐛 add openssl install + upgrade image to 1.32.0
alpine/k8s:1.28.0 does not ship openssl. Script calls openssl enc
on line 116 causing exit 127 on every run.

Fix:
- apk add --no-cache openssl at script start (defensive, idempotent)
- upgrade image 1.28.0 -> 1.32.0 (kubectl client 5 minor versions behind
  cluster v1.33, outside supported skew of +/-1)
2026-06-12 09:33:20 +02:00
053acd7596
feat(observability): 📊 add backup failure alerting rules
VMRule alerts for forgejo-s3-backup and secrets-backup CronJobs:
- BackupCronJobNotScheduled (>26h since last run)
- BackupCronJobNeverScheduled (never ran)
- BackupJobFailed (job failed)
- BackupJobTooSlow (running >5min)

Ref: IPCEICIS-9313
Ref: IPCEICIS-2810
2026-06-08 15:07:14 +02:00
b087dac0f1
fix(core): 🐛 remove template vars from secrets-backup — use K8s secrets directly
The deploy workflow does not have BACKUP_ENCRYPTION_KEY/BACKUP_BUCKET/OBS_ENDPOINT
env vars. Redesigned to reference existing forgejo-cloud-credentials K8s secret
and hardcode OBS endpoint, matching the pattern of forgejo-s3-backup-cronjob.

Ref: IPCEICIS-9317
2026-06-08 14:02:04 +02:00
863bcd4883
feat(core): add secrets-backup CronJob as ArgoCD Application
Backs up critical K8s secrets (argocd, cert-manager, external-secrets)
to OBS. Uses template variables for environment-specific values.

Ref: IPCEICIS-9317
2026-06-08 13:12:18 +02:00
02308cf633
chore: bump garm image to v0.1.7-forgejo-23 (OOM detection) 2026-05-19 16:14:31 +02:00
Martin McCaffery
aaf9e6eade
bump garm-helm to v0.0.16 (RBAC fix)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-19 09:51:38 +02:00
Martin McCaffery
d857f155a8
feat(dex): add ci-sizer OIDC static client to template 2026-05-18 17:18:44 +02:00
Martin McCaffery
707a7b933a
feat(registry): add ci-sizer registry template 2026-05-18 16:27:56 +02:00
a8ce4c5c38
fix(sizer): 🐛 use internal K8s service URL for GARM connection
Switch GARM conditional from explicit GARM_URL env var to DOMAIN_GITEA
presence check. When Forgejo is deployed, GARM is always available at
its cluster-internal service (http://garm.garm.svc:80). Hardcode admin
user since GARM always uses that. GitLab-only deploys skip the block.

Ref: IPCEICIS-6886
2026-05-18 10:22:26 +02:00
32665ff620
fix(ci-sizer): 🐛 use safe map access for optional GARM_URL env var
`index .Env "GARM_URL"` returns empty string for missing keys instead
of panicking with "map has no entry for key".

Ref: IPCEICIS-6886
2026-05-18 10:15:58 +02:00
2a12a568ce
docs(stacks): 📝 add clarifying comments to stack templates
Document which components are required vs opt-in for deployment modes.

Ref: IPCEICIS-6886
2026-05-15 16:35:02 +02:00
d161b8ea4d
docs(ci-sizer): 📝 add opt-in comment to gitlab webhook app
Clarifies that the GitLab webhook ArgoCD app is optional and should
only be hydrated for clusters running GitLab Runner.

Ref: IPCEICIS-6886
2026-05-15 16:33:52 +02:00
fe51e8588c
feat(ci-sizer): add gitlab-webhook ArgoCD app to stacks template
Adds the mutating webhook deployment as a managed ArgoCD application
alongside the existing sizer-receiver. Includes deployment, service,
RBAC, cert-manager certificates, and webhook configuration.

Ref: IPCEICIS-6886
2026-05-15 16:30:42 +02:00
adf7f23685
fix(sizer): 🐛 make GARM env vars conditional in receiver deployment
Clusters without GARM lack the garm-fixed-credentials secret, causing
pod crash loops. The receiver already handles empty GARM_URL gracefully.

Ref: IPCEICIS-6886
2026-05-15 16:30:42 +02:00
Daniel.Sy
1f4489bd70 fix(ci-sizer): use getenv with default for SIZER_ALLOWED_ORG
Prevents gomplate crash when SIZER_ALLOWED_ORG is not set in environment.
Falls back to DevFW-CICD as default org.
2026-05-13 10:18:43 +00:00
5eaf4a761a
fix: increased s3 backup disk size 2026-05-07 17:48:17 +02:00
a957ca14b7 Merge pull request 'Update template/stacks/forgejo/forgejo-server/manifests/forgejo-ingress.yaml' (#36) from proxy-body-size into main
Reviewed-on: https://edp.buildth.ing/DevFW-CICD/stacks/pulls/36
2026-05-05 12:04:23 +00:00
manuel.ganter
4f04de2543 Update template/stacks/forgejo/forgejo-server/manifests/forgejo-ingress.yaml
Requested to push bigger images
https://teams.microsoft.com/l/message/19:8cbad0f19e894c9296838715ef5ce72a@thread.v2/1777969188676?context=%7B%22contextType%22%3A%22chat%22%7D
2026-05-05 08:27:38 +00:00
52cb25a6f9
refactor(stacks): 🚚 migrate sizer-receiver from garm to ci-sizer namespace
Move sizer-receiver ArgoCD app and manifests from stacks/garm/ to
stacks/ci-sizer/. The sizer is provider-agnostic and no longer
belongs in the GARM-specific stack.

- destination namespace: garm → ci-sizer
- ArgoCD source path: stacks/garm/ → stacks/ci-sizer/
- ingress namespace: garm → ci-sizer
- GARM_URL unchanged (garm.garm.svc.cluster.local) — GARM server stays in its namespace
- Secrets (sizer-tokens, sizer-oidc-client, garm-fixed-credentials) must exist in ci-sizer namespace
2026-04-29 10:16:45 +02:00
54dfd0831d
chore: ⬆️ bump garm image to v0.1.7-forgejo-22 2026-04-28 10:11:09 +02:00
44fc9ace56
chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-21
Fix orphaned runner pods — instance name mismatch resolved.
2026-04-24 15:47:15 +02:00
2185d7962a
chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-20
Remove activeDeadlineSeconds — was killing legitimate long CI jobs.
2026-04-24 14:51:43 +02:00
864494ffad
chore(garm): ⬆️ bump garm-forgejo to v0.1.7-forgejo-19
Includes provider v2.0.41 with ci-sizer v0.0.71:
- Fix workflow/job detail view showing all historical runs
- CI workflow fixes (Forgejo action URLs, SBOM skip)
- REUSE compliance
2026-04-24 13:41:50 +02:00
cbf8ab891d
chore: bump garm image to v0.1.7-forgejo-18 2026-04-22 13:19:10 +02:00
cfacb67789
chore: bump garm to v0.1.7-forgejo-17 (activeDeadlineSeconds) 2026-04-21 17:15:26 +02:00
ec1c1bec74
chore(garm): ⬆️ bump garm-helm to v0.0.15 (startup probe fix) 2026-04-21 16:27:25 +02:00
db76e7a517
chore(garm): ⬆️ bump garm-helm chart to v0.0.14 2026-04-21 16:03:06 +02:00
3c5c9ecbbc
chore: bump garm image to v0.1.7-forgejo-16 2026-04-21 15:53:40 +02:00
0dbd286615
chore(sizer): 🔧 rename forgejo-runner-sizer to ci-sizer in deployment configs
- Update container image names to ci-sizer-{receiver,collector}
- Update Dex OIDC client ID and name to ci-sizer
- Template allowed-org as SIZER_ALLOWED_ORG variable
2026-04-21 14:16:38 +02:00
1b3bb0061e
feat(ci-sizer): 🚀Added ci-sizer subdomain to sizer-receiver
Ref: IPCEICIS-8516
2026-04-21 14:06:52 +02:00
336796995d
chore(garm): ⬆️ bump garm to v0.1.7-forgejo-15 2026-04-20 17:31:44 +02:00
50c62b2ce0
chore(garm): ⬆️ bump garm to v0.1.7-forgejo-14, add sizing policy env vars 2026-04-20 16:08:27 +02:00
a2c635ae6e
fix(garm): 🔧 sync sizer-receiver template with production config and bump garm tag to v0.1.7-forgejo-13 2026-04-16 15:11:54 +02:00
Martin McCaffery
7bf72a39d8
Enforce MFA for all admin users 2026-03-17 14:06:06 +01:00
Martin McCaffery
7eed0cd5f8
Rename optimiser to sizer 2026-03-10 10:08:11 +01:00
fb7c64ab2f
refactor: Rename optimiser-receiver to sizer-receiver and update related configurations 2026-03-06 14:03:03 +01:00
martin.mccaffery
1de5edd974 Pin GARM image version 2026-03-04 16:43:30 +00:00
Martin McCaffery
426b8cd5b2
Update garm-helm version to v0.0.7 2026-03-04 17:04:53 +01:00
d522461bc1
chore(config): ⬆️ Bump Forgejo Helm chart to v16.2.0
Updates the Helm chart version to incorporate the latest features,
improvements, and bug fixes from upstream. Ensures deployment uses a
more recent and supported release.
2026-03-03 17:29:34 +01:00
aa8ab8c63f
chore(core): ⬆️ Bump Argo CD version to 9.4.6
Updates the Argo CD deployment to use a newer version, improving compatibility and potentially resolving issues tied to older releases.

Relates to ongoing maintenance and upstream bug tracking.
2026-03-03 11:52:40 +01:00
Martin McCaffery
b36613ae87
Point Garm to new fixed-credentials secret 2026-02-26 14:17:28 +01:00
Martin McCaffery
9bff9bd628
Add new images for static forgejo runners 2026-02-17 13:28:02 +01:00
Martin McCaffery
cd49aadaa5
Fix argocd: stop cloudnative-pg creating too-long annotation 2026-02-17 09:48:31 +01:00
Martin McCaffery
ceca7d4e82
Add ingress for optimiser-receiver 2026-02-13 09:30:12 +01:00
Martin McCaffery
f7bab3b2c6
Add optimiser deployment to garm stack 2026-02-12 16:33:21 +01:00
Martin McCaffery
19b1c120e2
Add empty stacks file for cloudnative-pg 2026-01-30 11:47:22 +01:00
Martin McCaffery
fa93ba9163
Add more provider config to GARM helm values 2026-01-30 11:24:55 +01:00
Martin McCaffery
7eb0cdff9d
Re-enable dex 2026-01-29 15:57:06 +01:00
Martin McCaffery
0effbce5cf
Add docs registry 2026-01-29 10:54:53 +01:00
Martin McCaffery
95e86b2711
Disable dex (not yet functional) to save node capacity 2026-01-29 09:29:20 +01:00
Manuel Ganter
ce8865007c
bumped garm to v0.0.4 2025-12-08 11:06:51 +01:00
Manuel Ganter
5b438097bb
bumped argo to argo-cd-9.1.5 2025-12-02 15:37:45 +01:00
Manuel Ganter
89f92fdabc
bumped garm version 2025-12-02 14:57:37 +01:00
Manuel Ganter
97709eff30
added garm to stacks 2025-12-02 13:56:47 +01:00
Manuel Ganter
44fecf67c2
added oidc env vars for terralist 2025-12-01 15:03:31 +01:00
Manuel Ganter
45da6fc210
added FORGEJO_IMAGE_TAG env var 2025-11-28 11:27:50 +01:00
Manuel Ganter
94c51a4d77
added terralist 2025-11-28 10:51:23 +01:00
Manuel Ganter
115e8f27f6
added coder stack 2025-11-27 16:28:22 +01:00
richardrobertreitz
4d1621b783 chore(alerts): disabled bogus alerts related to kubecontrollermanager and kubescheduler 2025-10-21 08:47:29 +00:00
47c16eeafd feat(vmuser): use secret instead of hardcoded value for authentication 2025-08-18 10:38:08 +02:00
2eab9bd80b feat(sso): configure sso for ArgoCD 2025-08-15 15:10:55 +02:00
699b6cedcb
fix(backup): Increased s3 backup volume size to 100GB
Refs: DevFW/infra-deploy#116
2025-08-15 10:56:36 +02:00
c8d5195dc7 feat(sso): introduced grafana OAUTH config 2025-08-15 10:01:04 +02:00
Richard Robert Reitz
b3f77644e9 feat(sso): using secret references in dex to not put secrets in git 2025-08-14 16:22:11 +02:00
Richard Robert Reitz
d677b4b0e7 feat(sso): added dex and added template parameters for grafana and dex 2025-08-14 15:55:03 +02:00
Daniel Sy
67c513d1a5
feat(alerts): 🎉 Add disk consumption high alert rule
Introduce a new alert rule for monitoring high disk consumption in Kubernetes. This enhances observability by providing alerts when disk usage exceeds 60%, helping to maintain storage health in the cluster environment.

Refs: DevFW/infra-deploy#109
2025-08-13 13:38:31 +02:00
3a666e718f feat(edp): changed disck-volume-type from SATA to GPSSD 2025-08-13 10:55:15 +02:00
richardrobertreitz
b3582b9929 fix(backup): Fixed syntax problem related to forgejo s3 backups 2025-08-13 08:00:52 +00:00
Manuel Ganter
3277d6d854
introduced control parameter for cronjob 2025-08-12 16:16:55 +02:00
a92ed86c4d
fix(observability): Disabled scraping of kube controller manager and scheduler
They are managed by OTC
2025-08-12 15:06:14 +02:00
fb64314fb2
feat(observability): Introduced alert priority for notifications 2025-08-12 14:20:01 +02:00
975bb6b982
feat(observability): Introduced alert for failed s3 backup jobs 2025-08-12 14:07:38 +02:00
e0f6cc77dd
fix(observability): Added missing encryption to grafana volume 2025-08-12 13:37:56 +02:00
dbda3d4ab5 fix(cronjob): fix bug where only packages got backuped 2025-08-11 15:34:38 +02:00
28c23b9f08
chore: set default storage class to csi-disk driver 2025-08-08 15:25:25 +02:00
richardrobertreitz
f19b294b26 chore(OTC): changed obsolete disk type 2025-08-07 11:30:27 +00:00
Daniel Sy
643176228e
Revert "feat(grafana alerts): add notification channel (email) for grafana alerts"
This reverts commit c9d14d451f.
2025-08-05 15:25:42 +02:00
Daniel Sy
ea6b18b7ea
feat(alertmanager): 🎉 Enable managed configuration for alerts
Updates the Alertmanager configuration to use managed settings, enabling streamlined alert handling. Removes outdated configurations and introduces a new email receiver for Grafana alerts.
2025-08-05 15:24:37 +02:00
c9d14d451f feat(grafana alerts): add notification channel (email) for grafana alerts 2025-08-05 15:01:12 +02:00
6af5ce71cd feat(forgejo): updated secret ref for a bucket name 2025-08-01 10:31:04 +02:00
55d9a06dc7 feat(forgejo): backup s3 directly to pvc 2025-08-01 10:31:04 +02:00
Richard Robert Reitz
491be80842 fix(s3backup): doing a local backup first and then push it to remote, which is still on the same OBS store 2025-08-01 10:31:04 +02:00
Daniel Sy
e7d14a89cd feat(manifest): 🎉 WIP Add CronJob and Secret for S3 backups
Adds a new CronJob for scheduled S3 backups using rclone, along with a corresponding Secret for AWS credentials. This introduces automated backup functionality for the Forgejo server, enhancing data protection and recovery capabilities.
2025-08-01 10:31:04 +02:00
richardrobertreitz
51a55b5ed4 fix(forgejo): Enable email notifications for common things like PR's 2025-07-31 09:31:00 +00:00
richardrobertreitz
30c2ec054b chore(pipeline): Remove use for our three helm mirrors 2025-07-30 13:55:38 +00:00
richardrobertreitz
fb03ded960 chore(pipeline): Remove use for our three helm mirrors 2025-07-30 13:54:53 +00:00
richardrobertreitz
278c832cb4 chore(pipeline): Remove use for our three helm mirrors 2025-07-30 13:54:04 +00:00
richardrobertreitz
a2324a16b7 test(pipeline): Revert of general test of OSC dependencies
helm-chart-4.12.4 will require an update of argocd to version >=3
2025-07-30 12:39:18 +00:00
richardrobertreitz
d79653cc64 test(pipeline): Revert of general test of OSC dependencies
Only v1.1.0-edp-v11.0.3 works currently
2025-07-30 12:38:10 +00:00
59 changed files with 2551 additions and 221 deletions

View file

@ -0,0 +1,35 @@
name: Build secrets-backup image
on:
push:
paths:
- 'build/secrets-backup/Dockerfile'
branches:
- main
workflow_dispatch:
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Log in to registry
run: |
echo "${{ secrets.PACKAGES_TOKEN }}" | \
docker login edp.buildth.ing \
-u "${{ env.FORGEJO_REPOSITORY_OWNER }}" \
--password-stdin
- name: Build image
run: |
docker build \
-t edp.buildth.ing/devfw-cicd/secrets-backup:1.0.1 \
-t edp.buildth.ing/devfw-cicd/secrets-backup:latest \
build/secrets-backup/
- name: Push image
run: |
docker push edp.buildth.ing/devfw-cicd/secrets-backup:1.0.1
docker push edp.buildth.ing/devfw-cicd/secrets-backup:latest

View file

@ -0,0 +1,3 @@
FROM alpine/k8s:1.32.0
# No extra packages needed — kubectl and aws CLI are bundled in alpine/k8s
# OBS SSE-KMS handles encryption at rest; no openssl required

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ci-sizer-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/ci-sizer"
repoURL: "https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coder-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/coder"
repoURL: "https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: docs-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: argocd-stack
repoURL: "https://edp.buildth.ing/DevFW-CICD/website-and-documentation"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: garm-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/garm"
repoURL: "https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,24 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: terralist-reg
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
name: in-cluster
namespace: argocd
source:
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/terralist"
repoURL: "https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}"
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,29 @@
# Optional: GitLab CI integration
# Only hydrate this app for clusters that run GitLab Runner.
# For Forgejo/GitHub-only deployments, omit this app from stacks-instances.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: gitlab-sizer-webhook
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ci-sizer
source:
repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/ci-sizer/gitlab-webhook"

View file

@ -0,0 +1,27 @@
# Self-signed Issuer for webhook TLS.
# For production, replace with a ClusterIssuer backed by a real CA.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
# cert-manager Certificate for the webhook TLS.
# The resulting Secret (gitlab-sizer-webhook-tls) is mounted into the webhook pod.
# cert-manager also injects the CA into the MutatingWebhookConfiguration via the
# cert-manager.io/inject-ca-from annotation.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: gitlab-sizer-webhook-cert
spec:
secretName: gitlab-sizer-webhook-tls
issuerRef:
name: selfsigned-issuer
kind: Issuer
dnsNames:
- gitlab-sizer-webhook.ci-sizer.svc
- gitlab-sizer-webhook.ci-sizer.svc.cluster.local
duration: 8760h
renewBefore: 720h

View file

@ -0,0 +1,141 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-sizer-webhook
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gitlab-sizer-webhook
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitlab-sizer-webhook
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gitlab-sizer-webhook
subjects:
- kind: ServiceAccount
name: gitlab-sizer-webhook
namespace: ci-sizer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-sizer-webhook
labels:
app: gitlab-sizer-webhook
spec:
replicas: 2
selector:
matchLabels:
app: gitlab-sizer-webhook
template:
metadata:
labels:
app: gitlab-sizer-webhook
spec:
serviceAccountName: gitlab-sizer-webhook
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: webhook
image: edp.buildth.ing/devfw-cicd/gitlab-webhook-edge-connect:latest
imagePullPolicy: Always
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
ports:
- containerPort: 8443
protocol: TCP
args:
- --listen-addr=:8443
- --tls-cert-file=/etc/webhook/tls/tls.crt
- --tls-key-file=/etc/webhook/tls/tls.key
- --sizer-url=http://sizer-receiver.ci-sizer.svc:8080
- --sizer-sidecar-image=edp.buildth.ing/devfw-cicd/ci-sizer-collector:latest
env:
- name: WEBHOOK_SIZER_READ_TOKEN
valueFrom:
secretKeyRef:
name: gitlab-sizer-webhook-tokens
key: sizer-read-token
- name: WEBHOOK_SIZER_PUSH_TOKEN
valueFrom:
secretKeyRef:
name: gitlab-sizer-webhook-tokens
key: sizer-push-token
- name: HTTP_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: HTTP_PROXY
optional: true
- name: HTTPS_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: HTTPS_PROXY
optional: true
- name: NO_PROXY
valueFrom:
configMapKeyRef:
name: gitlab-sizer-webhook-config
key: NO_PROXY
optional: true
volumeMounts:
- name: webhook-tls
mountPath: /etc/webhook/tls
readOnly: true
livenessProbe:
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /healthz
port: 8443
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 128Mi
volumes:
- name: webhook-tls
secret:
secretName: gitlab-sizer-webhook-tls
---
apiVersion: v1
kind: Service
metadata:
name: gitlab-sizer-webhook
labels:
app: gitlab-sizer-webhook
spec:
selector:
app: gitlab-sizer-webhook
ports:
- port: 443
targetPort: 8443
protocol: TCP

View file

@ -0,0 +1,30 @@
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: gitlab-sizer-webhook
annotations:
cert-manager.io/inject-ca-from: ci-sizer/gitlab-sizer-webhook-cert
webhooks:
- name: gitlab-sizer-webhook.ci-sizer.svc
admissionReviewVersions: ["v1"]
sideEffects: NoneOnDryRun
failurePolicy: Ignore
timeoutSeconds: 5
reinvocationPolicy: Never
clientConfig:
service:
name: gitlab-sizer-webhook
namespace: ci-sizer
path: /mutate
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods"]
namespaceSelector:
matchLabels:
ci-sizer.devfw.io/watch: "true"
objectSelector:
matchExpressions:
- key: job.runner.gitlab.com/pod
operator: Exists

View file

@ -0,0 +1,29 @@
# Required: CI Sizer receiver
# Always deploy this — it stores metrics and computes sizing recommendations.
# Works standalone or with GARM (Forgejo/GitHub) and/or GitLab webhook.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: sizer-receiver
namespace: argocd
labels:
env: dev
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: ci-sizer
source:
repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/ci-sizer/sizer-receiver"

View file

@ -0,0 +1,128 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: sizer-receiver
labels:
app: sizer-receiver
spec:
strategy:
type: Recreate
replicas: 1
selector:
matchLabels:
app: sizer-receiver
template:
metadata:
labels:
app: sizer-receiver
spec:
securityContext:
fsGroup: 65534
containers:
- name: receiver
image: edp.buildth.ing/devfw-cicd/ci-sizer-receiver:latest
imagePullPolicy: Always
args:
- --db=/data/metrics.db
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: RECEIVER_READ_TOKEN
valueFrom:
secretKeyRef:
name: sizer-tokens
key: read-token
- name: RECEIVER_HMAC_KEY
valueFrom:
secretKeyRef:
name: sizer-tokens
key: hmac-key
{{{- if index .Env "DOMAIN_GITEA" }}}
- name: GARM_URL
value: "http://garm.garm.svc:80"
- name: GARM_USER
value: "admin"
- name: GARM_PASSWORD
valueFrom:
secretKeyRef:
name: garm-fixed-credentials
key: admin_password
{{{- end }}}
- name: RECEIVER_OIDC_ISSUER
value: "https://dex.{{{ .Env.DOMAIN }}}"
- name: RECEIVER_OIDC_CLIENT_ID
value: "ci-sizer"
- name: RECEIVER_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: sizer-oidc-client
key: client-secret
- name: RECEIVER_OIDC_REDIRECT_URI
value: "https://sizer.{{{ .Env.DOMAIN }}}/ui/callback"
- name: RECEIVER_SESSION_TTL
value: "12h"
- name: RECEIVER_ALLOWED_ORG
value: "{{{ getenv "SIZER_ALLOWED_ORG" "DevFW-CICD" }}}"
- name: RECEIVER_CPU_SIZING_MODE
value: "observe"
- name: RECEIVER_MEMORY_QOS
value: "guaranteed"
volumeMounts:
- name: data
mountPath: /data
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 2
periodSeconds: 10
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
volumes:
- name: data
persistentVolumeClaim:
claimName: sizer-receiver-data
---
apiVersion: v1
kind: Service
metadata:
name: sizer-receiver
labels:
app: sizer-receiver
spec:
selector:
app: sizer-receiver
ports:
- name: http
port: 8080
targetPort: http
protocol: TCP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sizer-receiver-data
labels:
app: sizer-receiver
annotations:
everest.io/disk-volume-type: GPSSD
spec:
storageClassName: csi-disk
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

View file

@ -0,0 +1,40 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
cert-manager.io/cluster-issuer: main
{{{ if eq .Env.CLUSTER_TYPE "osc" }}}
dns.gardener.cloud/class: garden
dns.gardener.cloud/dnsnames: sizer.{{{ .Env.DOMAIN }}}
dns.gardener.cloud/ttl: "600"
{{{ end }}}
name: sizer-receiver
namespace: ci-sizer
spec:
ingressClassName: nginx
rules:
- host: sizer.{{{ .Env.DOMAIN }}}
http:
paths:
- backend:
service:
name: sizer-receiver
port:
number: 8080
path: /
pathType: Prefix
- host: ci-sizer.{{{ .Env.DOMAIN }}}
http:
paths:
- backend:
service:
name: sizer-receiver
port:
number: 8080
path: /
pathType: Prefix
tls:
- hosts:
- sizer.{{{ .Env.DOMAIN }}}
secretName: sizer-receiver-tls

View file

@ -0,0 +1,32 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coder
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: coder
sources:
- repoURL: https://helm.coder.com/v2
chart: coder
targetRevision: 2.28.3
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/coder/coder/values.yaml
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
ref: values
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/coder/coder/manifests"

View file

@ -0,0 +1,38 @@
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: coder-db
namespace: coder
spec:
instances: 1
primaryUpdateStrategy: unsupervised
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "1Gi"
cpu: "1"
managed:
roles:
- name: coder
createdb: true
login: true
passwordSecret:
name: coder-db-user
storage:
size: 10Gi
storageClass: csi-disk
---
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
name: coder
namespace: coder
spec:
cluster:
name: coder-db
name: coder
owner: coder
---

View file

@ -0,0 +1,61 @@
coder:
# You can specify any environment variables you'd like to pass to Coder
# here. Coder consumes environment variables listed in
# `coder server --help`, and these environment variables are also passed
# to the workspace provisioner (so you can consume them in your Terraform
# templates for auth keys etc.).
#
# Please keep in mind that you should not set `CODER_HTTP_ADDRESS`,
# `CODER_TLS_ENABLE`, `CODER_TLS_CERT_FILE` or `CODER_TLS_KEY_FILE` as
# they are already set by the Helm chart and will cause conflicts.
env:
- name: CODER_ACCESS_URL
value: https://coder.{{{ .Env.DOMAIN_GITEA }}}
- name: CODER_PG_CONNECTION_URL
valueFrom:
secretKeyRef:
# You'll need to create a secret called coder-db-url with your
# Postgres connection URL like:
# postgres://coder:password@postgres:5432/coder?sslmode=disable
name: coder-db-user
key: url
# For production deployments, we recommend configuring your own GitHub
# OAuth2 provider and disabling the default one.
- name: CODER_OAUTH2_GITHUB_DEFAULT_PROVIDER_ENABLE
value: "false"
- name: EDGE_CONNECT_ENDPOINT
valueFrom:
secretKeyRef:
name: edge-credential
key: endpoint
- name: EDGE_CONNECT_USERNAME
valueFrom:
secretKeyRef:
name: edge-credential
key: username
- name: EDGE_CONNECT_PASSWORD
valueFrom:
secretKeyRef:
name: edge-credential
key: password
# (Optional) For production deployments the access URL should be set.
# If you're just trying Coder, access the dashboard via the service IP.
# - name: CODER_ACCESS_URL
# value: "https://coder.example.com"
#tls:
# secretNames:
# - my-tls-secret-name
service:
type: ClusterIP
ingress:
enable: true
className: nginx
host: coder.{{{ .Env.DOMAIN_GITEA }}}
annotations:
cert-manager.io/cluster-issuer: main
tls:
enable: true
secretName: coder-tls-secret

View file

@ -18,12 +18,12 @@ spec:
name: in-cluster
namespace: argocd
sources:
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/DevFW-CICD/argocd-helm.git
- repoURL: https://github.com/argoproj/argo-helm.git
path: charts/argo-cd
# TODO: RIRE Can be updated when https://github.com/argoproj/argo-cd/issues/20790 is fixed and merged
# As logout make problems, it is suggested to switch from path based routing to an own argocd domain,
# similar to the CNOE amazon reference implementation and in our case, Forgejo
targetRevision: argo-cd-7.8.28-depends
targetRevision: argo-cd-9.4.6
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/core/argocd/values.yaml

View file

@ -5,6 +5,16 @@ configs:
params:
server.insecure: true
cm:
oidc.config: |
name: FORGEJO
issuer: https://{{{ .Env.DOMAIN_DEX }}}
clientID: controller-argocd-dex
clientSecret: $dex-argo-client:clientSecret
requestedScopes:
- openid
- profile
- email
- groups
application.resourceTrackingMethod: annotation
timeout.reconciliation: 60s
resource.exclusions: |
@ -18,10 +28,9 @@ configs:
- CiliumIdentity
clusters:
- "*"
accounts.provider-argocd: apiKey
url: https://{{{ .Env.DOMAIN_ARGOCD }}}
rbac:
policy.csv: 'g, provider-argocd, role:admin'
policy.csv: 'g, DevFW, role:admin'
tls:
certificates:
@ -31,3 +40,19 @@ notifications:
dex:
enabled: false
controller:
metrics:
enabled: true
server:
metrics:
enabled: true
repoServer:
metrics:
enabled: true
applicationSet:
metrics:
enabled: true

View file

@ -0,0 +1,30 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cloudnative-pg
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
retry:
limit: -1
destination:
name: in-cluster
namespace: cloudnative-pg
sources:
- repoURL: https://cloudnative-pg.github.io/charts
chart: cloudnative-pg
targetRevision: 0.26.1
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/core/cloudnative-pg/values.yaml
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
ref: values

View file

@ -0,0 +1 @@
# No need for values here.

View file

@ -0,0 +1,29 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dex
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: dex
sources:
- repoURL: https://charts.dexidp.io
chart: dex
targetRevision: 0.23.0
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/core/dex/values.yaml
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,86 @@
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: main
hosts:
- host: {{{ .Env.DOMAIN_DEX }}}
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- {{{ .Env.DOMAIN_DEX }}}
secretName: dex-cert
envVars:
- name: FORGEJO_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-forgejo-client
key: clientSecret
- name: FORGEJO_CLIENT_ID
valueFrom:
secretKeyRef:
name: dex-forgejo-client
key: clientID
- name: OIDC_DEX_GRAFANA_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-grafana-client
key: clientSecret
- name: OIDC_DEX_ARGO_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-argo-client
key: clientSecret
- name: FORGEJO_RUNNER_SIZER_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: dex-sizer-client
key: clientSecret
- name: LOG_LEVEL
value: debug
config:
# Set it to a valid URL
issuer: https://{{{ .Env.DOMAIN_DEX }}}
# See https://dexidp.io/docs/storage/ for more options
storage:
type: memory
oauth2:
skipApprovalScreen: true
alwaysShowLoginScreen: false
connectors:
- type: gitea
id: gitea
name: Forgejo
config:
clientID: "$FORGEJO_CLIENT_ID"
clientSecret: "$FORGEJO_CLIENT_SECRET"
redirectURI: https://{{{ .Env.DOMAIN_DEX }}}/callback
baseURL: https://edp.buildth.ing
# loadAllGroups: true
orgs:
- name: DevFW
enablePasswordDB: false
staticClients:
- id: controller-argocd-dex
name: ArgoCD Client
redirectURIs:
- "https://{{{ .Env.DOMAIN_ARGOCD }}}/auth/callback"
secretEnv: "OIDC_DEX_ARGO_CLIENT_SECRET"
- id: grafana
redirectURIs:
- "https://{{{ .Env.DOMAIN_GRAFANA }}}/login/generic_oauth"
name: "Grafana"
secretEnv: "OIDC_DEX_GRAFANA_CLIENT_SECRET"
- id: ci-sizer
name: "CI Sizer"
redirectURIs:
- "https://sizer.{{{ .Env.DOMAIN }}}/ui/callback"
secretEnv: "FORGEJO_RUNNER_SIZER_CLIENT_SECRET"

View file

@ -0,0 +1,23 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: secrets-backup
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: gitea
sources:
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
path: "{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/core/secrets-backup/manifests"

View file

@ -0,0 +1,107 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: secrets-backup
namespace: gitea
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: secrets-backup-reader
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: secrets-backup-reader
subjects:
- kind: ServiceAccount
name: secrets-backup
namespace: gitea
roleRef:
kind: ClusterRole
name: secrets-backup-reader
apiGroup: rbac.authorization.k8s.io
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: secrets-backup
namespace: gitea
spec:
schedule: "30 3 * * *"
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
startingDeadlineSeconds: 600 # 10 minutes
jobTemplate:
spec:
activeDeadlineSeconds: 900
backoffLimit: 2
ttlSecondsAfterFinished: 259200
template:
spec:
serviceAccountName: secrets-backup
containers:
- name: secrets-backup
image: edp.buildth.ing/devfw-cicd/secrets-backup:1.0.1
imagePullPolicy: IfNotPresent
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: secret-key
- name: SOURCE_BUCKET
valueFrom:
secretKeyRef:
name: forgejo-cloud-credentials
key: bucket-name
- name: OBS_ENDPOINT
value: "obs.eu-de.otc.t-systems.com"
command:
- /bin/sh
- -c
- |
set -euo pipefail
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/tmp/secrets-backup-${TIMESTAMP}"
NAMESPACES="argocd cert-manager external-secrets"
mkdir -p "${BACKUP_DIR}"
echo "=== Exporting secrets from critical namespaces ==="
for NS in ${NAMESPACES}; do
echo "Exporting namespace: ${NS}"
kubectl get secrets -n "${NS}" \
-o json \
--field-selector type!=kubernetes.io/service-account-token \
> "${BACKUP_DIR}/${NS}-secrets.json"
done
echo "=== Creating compressed archive ==="
ARCHIVE="${BACKUP_DIR}/secrets-backup-${TIMESTAMP}.tar.gz"
tar -czf "${ARCHIVE}" -C "${BACKUP_DIR}" \
$(ls "${BACKUP_DIR}"/*.json 2>/dev/null | xargs -n1 basename)
echo "=== Uploading to OBS (SSE-KMS encryption at rest) ==="
aws s3 cp "${ARCHIVE}" \
"s3://${SOURCE_BUCKET}/cluster-secrets-backup/${TIMESTAMP}/secrets-backup.tar.gz" \
--endpoint-url "https://${OBS_ENDPOINT}"
echo "=== Cleanup ==="
rm -rf "${BACKUP_DIR}"
echo "Backup completed: ${TIMESTAMP}"
restartPolicy: OnFailure

View file

@ -28,7 +28,7 @@ spec:
# https://forgejo.org/docs/v1.21/admin/actions/#offline-registration
initContainers:
- name: runner-register
image: code.forgejo.org/forgejo/runner:6.4.0
image: code.forgejo.org/forgejo/runner:12.6.4
command:
- "sh"
- "-c"
@ -39,7 +39,7 @@ spec:
--token ${RUNNER_SECRET} \
--name ${RUNNER_NAME} \
--instance ${FORGEJO_INSTANCE_URL} \
--labels docker:docker://node:20-bookworm,ubuntu-22.04:docker://ghcr.io/catthehacker/ubuntu:act-22.04,ubuntu-latest:docker://ghcr.io/catthehacker/ubuntu:act-22.04
--labels docker:docker://node:24-bookworm,ubuntu-22.04:docker://ghcr.io/catthehacker/ubuntu:act-22.04,ubuntu-latest:docker://ghcr.io/catthehacker/ubuntu:act-24.04,ubuntu-24.04:docker://ghcr.io/catthehacker/ubuntu:act-24.04
env:
- name: RUNNER_NAME
valueFrom:
@ -57,7 +57,7 @@ spec:
mountPath: /data
containers:
- name: runner
image: code.forgejo.org/forgejo/runner:6.4.0
image: code.forgejo.org/forgejo/runner:12.6.4
command:
- "sh"
- "-c"

View file

@ -18,15 +18,9 @@ spec:
name: in-cluster
namespace: gitea
sources:
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/DevFW-CICD/forgejo-helm.git
- repoURL: https://code.forgejo.org/forgejo-helm/forgejo-helm.git
path: .
# first check out the desired version (example v9.0.0): https://code.forgejo.org/forgejo-helm/forgejo-helm/src/tag/v9.0.0/Chart.yaml
# (note that the chart version is not the same as the forgejo application version, which is specified in the above Chart.yaml file)
# then use the devops pipeline and select development, forgejo and the desired version (example v9.0.0):
# https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/DevFW-CICD/devops-pipelines/actions?workflow=update-helm-depends.yaml&actor=0&status=0
# finally update the desired version here and include "-depends", it is created by the devops pipeline.
# why do we have an added "-depends" tag? it resolves rate limitings when downloading helm OCI dependencies
targetRevision: v12.0.0-depends
targetRevision: v16.2.0
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/forgejo/forgejo-server/values.yaml

View file

@ -3,7 +3,7 @@ kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: 512m
nginx.ingress.kubernetes.io/proxy-body-size: 5120m
cert-manager.io/cluster-issuer: main
{{{ if eq .Env.CLUSTER_TYPE "osc" }}}
dns.gardener.cloud/class: garden

View file

@ -5,8 +5,16 @@ metadata:
namespace: gitea
spec:
schedule: "0 1 * * *"
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
startingDeadlineSeconds: 600 # 10 minutes
jobTemplate:
spec:
# 4h window: bumped from 2h after Jun 20-21 deadline hit on heavy sync; BackupJobTooSlow alert fires at 5m
activeDeadlineSeconds: 14400
backoffLimit: 2
ttlSecondsAfterFinished: 259200 #
template:
spec:
containers:
@ -40,7 +48,7 @@ spec:
- /bin/sh
- -c
- |
rclone sync source:/${SOURCE_BUCKET}/packages /backup -v --ignore-checksum
rclone sync source:/${SOURCE_BUCKET} /backup -v --ignore-checksum
restartPolicy: OnFailure
volumes:
- name: rclone-config
@ -56,7 +64,7 @@ metadata:
name: s3-backup
namespace: gitea
annotations:
everest.io/disk-volume-type: SATA
everest.io/disk-volume-type: GPSSD
everest.io/crypt-key-id: {{{ .Env.PVC_KMS_KEY_ID }}}
spec:
storageClassName: csi-disk
@ -64,7 +72,7 @@ spec:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storage: 500Gi
---
apiVersion: v1
kind: Secret

View file

@ -1,13 +1,3 @@
# This is only used for deploying older versions of infra-catalogue where the bucket name is not an output of the terragrunt modules
{{{- define "BUCKET_NAME" -}}}
{{{- if (getenv "FORGEJO_BUCKET_NAME") -}}}
{{{ getenv "FORGEJO_BUCKET_NAME" }}}
{{{- else -}}}
edp-forgejo-{{{ getenv "CLUSTER_ENVIRONMENT" }}}
{{{- end -}}}
{{{- end -}}}
# We use recreate to make sure only one instance with one version is running, because Forgejo might break or data gets inconsistant.
strategy:
@ -31,7 +21,7 @@ persistence:
storageClass: csi-disk
annotations:
everest.io/crypt-key-id: {{{ .Env.PVC_KMS_KEY_ID }}}
everest.io/disk-volume-type: SATA
everest.io/disk-volume-type: GPSSD
test:
enabled: false
@ -134,7 +124,7 @@ gitea:
MINIO_ENDPOINT: obs.eu-de.otc.t-systems.com:443
STORAGE_TYPE: minio
MINIO_LOCATION: eu-de
MINIO_BUCKET: "{{{ template "BUCKET_NAME" }}}"
MINIO_BUCKET: "{{{ getenv "FORGEJO_BUCKET_NAME" }}}"
MINIO_USE_SSL: true
queue:
@ -147,8 +137,12 @@ gitea:
ENABLED: true
ADAPTER: redis
security:
GLOBAL_TWO_FACTOR_REQUIREMENT: admin
service:
DISABLE_REGISTRATION: true
ENABLE_NOTIFY_MAIL: true
other:
SHOW_FOOTER_VERSION: false
@ -184,19 +178,6 @@ image:
#tag: "8.0.3"
# Adds -rootless suffix to image name
# rootless: true
#fullOverride: {{{ getenv "CLIENT_REPO_DOMAIN" }}}/devfw-cicd/edp-forgejo:v1.1.0-edp-v11.0.3
fullOverride: {{{ getenv "CLIENT_REPO_DOMAIN" }}}/devfw-cicd/edp-forgejo:osctest
fullOverride: {{{ getenv "CLIENT_REPO_DOMAIN" }}}/devfw-cicd/edp-forgejo:{{{ .Env.FORGEJO_IMAGE_TAG }}}
forgejo:
runner:
enabled: true
image:
tag: latest
# replicas: 3
config:
runner:
labels:
- docker:docker://node:16-bullseye
- self-hosted:docker://ghcr.io/catthehacker/ubuntu:act-22.04
- ubuntu-22.04:docker://ghcr.io/catthehacker/ubuntu:act-22.04
- ubuntu-latest:docker://ghcr.io/catthehacker/ubuntu:act-22.04
forgejo: {}

View file

@ -0,0 +1,33 @@
# Default: Forgejo/GitHub Actions runner manager
# Deploys GARM with the ci-sizer provider for automatic sizing + collector injection.
# For GitLab-only deployments, omit this and use gitlab-webhook instead.
# See: ci-sizer/docs/deployment-modes.md
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: garm
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: garm
sources:
- repoURL: https://edp.buildth.ing/DevFW-CICD/garm-helm
path: charts/garm
targetRevision: v0.0.17
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/garm/garm/values.yaml
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,45 @@
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: main
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
hosts:
- host: garm.{{{ .Env.DOMAIN_GITEA }}}
paths:
- path: /
pathType: Prefix
tls:
- secretName: garm-net-tls
hosts:
- garm.{{{ .Env.DOMAIN_GITEA }}}
# Credentials and Secrets
credentials:
edgeConnect:
existingSecretName: "edge-credential"
gitea:
url: "https://{{{ .Env.DOMAIN_GITEA }}}" # Required
db:
existingSecretName: garm-fixed-credentials
image:
repository: {{{ .Env.CLIENT_REPO_DOMAIN }}}/devfw-cicd/garm-forgejo
tag: v0.1.7-forgejo-24
providerConfig:
edgeConnect:
organization: edp2
region: EU
edgeConnectUrl: "https://hub.apps.edge.platform.mg3.mdb.osc.live"
cloudlet:
name: Hamburg
organization: TelekomOP
edgeConnectK8s:
sizer:
sidecarImage: edp.buildth.ing/devfw-cicd/ci-sizer-collector:0.9.7
garm:
logging:
logLevel: info

View file

@ -0,0 +1,14 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: argocd
namespace: observability
spec:
namespaceSelector:
matchNames:
- argocd
selector:
matchLabels:
app.kubernetes.io/part-of: argocd
endpoints:
- port: http-metrics

View file

@ -0,0 +1,15 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: forgejo
namespace: observability
spec:
namespaceSelector:
matchNames:
- gitea
selector:
matchLabels:
app.kubernetes.io/name: forgejo
endpoints:
- port: http
path: /metrics

View file

@ -0,0 +1,15 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: garm
namespace: observability
spec:
namespaceSelector:
matchNames:
- garm
selector:
matchLabels:
app.kubernetes.io/name: garm
endpoints:
- port: http
path: /metrics

View file

@ -1,9 +0,0 @@
apiVersion: v1
kind: Secret
metadata:
name: simple-user-secret
namespace: observability
type: Opaque
stringData:
username: simple-user
password: simple-password

View file

@ -201,13 +201,13 @@ defaultRules:
create: true
rules: {}
kubernetesSystemControllerManager:
create: true
create: false
rules: {}
kubeScheduler:
create: true
create: false
rules: {}
kubernetesSystemScheduler:
create: true
create: false
rules: {}
kubeStateMetrics:
create: true
@ -801,6 +801,20 @@ vmagent:
# Do not store original labels in vmagent's memory by default. This reduces the amount of memory used by vmagent
# but makes vmagent debugging UI less informative. See: https://docs.victoriametrics.com/vmagent/#relabel-debug
promscrape.dropOriginalLabels: "true"
# Harden liveness probe: default failureThreshold=10 masked a 72h silent outage
livenessProbe:
httpGet:
path: /health
port: http
failureThreshold: 3
periodSeconds: 5
timeoutSeconds: 5
startupProbe:
httpGet:
path: /health
port: http
failureThreshold: 30
periodSeconds: 5
# -- (object) VMAgent ingress configuration
ingress:
enabled: false

View file

@ -0,0 +1,153 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: argocd-operational
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Applications"
json: |
{
"annotations": {"list": []},
"editable": true,
"graphTooltip": 1,
"panels": [
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
"title": "Application Status",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}}},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 1},
"title": "Total Apps",
"type": "stat",
"targets": [{"expr": "count(argocd_app_info{cluster_environment=~\"$cluster_environment\"})", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}}},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 1},
"title": "Healthy",
"type": "stat",
"targets": [{"expr": "count(argocd_app_info{cluster_environment=~\"$cluster_environment\", health_status=\"Healthy\"})", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "red", "value": null}]}}},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 1},
"title": "Degraded",
"type": "stat",
"targets": [{"expr": "count(argocd_app_info{cluster_environment=~\"$cluster_environment\", health_status=\"Degraded\"}) or vector(0)", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}}},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 1},
"title": "Synced",
"type": "stat",
"targets": [{"expr": "count(argocd_app_info{cluster_environment=~\"$cluster_environment\", sync_status=\"Synced\"})", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "yellow", "value": null}]}}},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 1},
"title": "OutOfSync",
"type": "stat",
"targets": [{"expr": "count(argocd_app_info{cluster_environment=~\"$cluster_environment\", sync_status=\"OutOfSync\"}) or vector(0)", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "orange", "value": null}]}}},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 1},
"title": "Progressing",
"type": "stat",
"targets": [{"expr": "count(argocd_app_info{cluster_environment=~\"$cluster_environment\", health_status=\"Progressing\"}) or vector(0)", "legendFormat": ""}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 5},
"title": "Application Details",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {"custom": {"filterable": true}},
"overrides": [
{"matcher": {"id": "byName", "options": "Health"}, "properties": [{"id": "custom.cellOptions", "value": {"type": "color-text"}}, {"id": "mappings", "value": [{"options": {"Healthy": {"color": "green", "text": "Healthy"}, "Degraded": {"color": "red", "text": "Degraded"}, "Progressing": {"color": "yellow", "text": "Progressing"}, "Missing": {"color": "purple", "text": "Missing"}}, "type": "value"}]}]},
{"matcher": {"id": "byName", "options": "Sync"}, "properties": [{"id": "custom.cellOptions", "value": {"type": "color-text"}}, {"id": "mappings", "value": [{"options": {"Synced": {"color": "green", "text": "Synced"}, "OutOfSync": {"color": "orange", "text": "OutOfSync"}}, "type": "value"}]}]}
]
},
"gridPos": {"h": 12, "w": 24, "x": 0, "y": 6},
"title": "All Applications",
"type": "table",
"targets": [{"expr": "argocd_app_info{cluster_environment=~\"$cluster_environment\"}", "format": "table", "instant": true, "legendFormat": ""}],
"transformations": [
{"id": "filterFieldsByName", "options": {"include": {"names": ["cluster_environment", "name", "dest_namespace", "health_status", "sync_status", "repo"]}}},
{"id": "organize", "options": {"renameByName": {"cluster_environment": "Environment", "name": "Application", "dest_namespace": "Namespace", "health_status": "Health", "sync_status": "Sync", "repo": "Repository"}}}
]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 18},
"title": "Sync Activity",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "ops"}},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 19},
"title": "Sync Operations (rate)",
"type": "timeseries",
"targets": [{"expr": "sum(rate(argocd_app_sync_total{cluster_environment=~\"$cluster_environment\"}[5m])) by (name, phase)", "legendFormat": "{{name}} ({{phase}})"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "ops"}},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 19},
"title": "Reconciliation Rate",
"type": "timeseries",
"targets": [{"expr": "sum(rate(argocd_app_reconcile_count{cluster_environment=~\"$cluster_environment\"}[5m])) by (namespace)", "legendFormat": "{{namespace}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 27},
"title": "ArgoCD Logs",
"type": "row"
},
{
"datasource": {"type": "victoriametrics-logs-datasource"},
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 28},
"title": "ArgoCD Logs",
"type": "logs",
"targets": [{"expr": "{cluster_environment=~\"$cluster_environment\", kubernetes.namespace=\"argocd\"}", "refId": "A"}],
"options": {"showTime": true, "showLabels": true, "wrapLogMessage": true, "enableLogDetails": true, "sortOrder": "Descending"}
}
],
"schemaVersion": 39,
"tags": ["edp", "argocd", "gitops"],
"templating": {
"list": [
{
"current": {"selected": true, "text": "All", "value": "$__all"},
"datasource": {"type": "prometheus"},
"definition": "label_values(argocd_app_info, cluster_environment)",
"includeAll": true,
"multi": true,
"name": "cluster_environment",
"label": "Environment",
"query": "label_values(argocd_app_info, cluster_environment)",
"refresh": 2,
"sort": 1,
"type": "query"
}
]
},
"time": {"from": "now-6h", "to": "now"},
"title": "ArgoCD Operations",
"uid": "edp-argocd-ops"
}

View file

@ -6,4 +6,5 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Applications"
url: "https://raw.githubusercontent.com/argoproj/argo-cd/refs/heads/master/examples/dashboard.json"

View file

@ -0,0 +1,103 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: cronjob-monitoring
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Operations"
json: |
{
"annotations": {"list": []},
"editable": true,
"graphTooltip": 1,
"panels": [
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
"title": "Backup Job Status",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "s", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 86400}, {"color": "red", "value": 172800}]}}},
"gridPos": {"h": 5, "w": 12, "x": 0, "y": 1},
"title": "Time Since Last Schedule",
"type": "stat",
"targets": [{"expr": "time() - kube_cronjob_status_last_schedule_time{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cronjob}} ({{cluster_environment}})"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "red", "value": 1}]}}},
"gridPos": {"h": 5, "w": 12, "x": 12, "y": 1},
"title": "Failed Jobs (Active)",
"type": "stat",
"targets": [{"expr": "sum by(cluster_environment, job_name) (kube_job_status_failed{cluster_environment=~\"$cluster_environment\"}) > 0", "legendFormat": "{{job_name}} ({{cluster_environment}})"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 6},
"title": "CronJob Overview",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"custom": {"filterable": true}}, "overrides": [{"matcher": {"id": "byName", "options": "Suspended"}, "properties": [{"id": "mappings", "value": [{"options": {"0": {"text": "No", "color": "green"}, "1": {"text": "YES", "color": "red"}}, "type": "value"}]}]}]},
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 7},
"title": "All CronJobs",
"type": "table",
"targets": [
{"expr": "kube_cronjob_info{cluster_environment=~\"$cluster_environment\"}", "format": "table", "instant": true, "refId": "A"}
],
"transformations": [
{"id": "filterFieldsByName", "options": {"include": {"names": ["cluster_environment", "cronjob", "namespace", "schedule"]}}},
{"id": "organize", "options": {"renameByName": {"cluster_environment": "Environment", "cronjob": "CronJob", "namespace": "Namespace", "schedule": "Schedule"}}}
]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 15},
"title": "Job History",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
"title": "Job Completions (24h)",
"type": "timeseries",
"targets": [{"expr": "sum(kube_job_status_succeeded{cluster_environment=~\"$cluster_environment\"}) by (job_name, cluster_environment)", "legendFormat": "{{job_name}} ({{cluster_environment}})"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "color": {"mode": "palette-classic"}}},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
"title": "Job Failures (24h)",
"type": "timeseries",
"targets": [{"expr": "sum(kube_job_status_failed{cluster_environment=~\"$cluster_environment\"}) by (job_name, cluster_environment)", "legendFormat": "{{job_name}} ({{cluster_environment}})"}]
}
],
"schemaVersion": 39,
"tags": ["edp", "backup", "cronjob"],
"templating": {
"list": [
{
"current": {"selected": true, "text": "All", "value": "$__all"},
"datasource": {"type": "prometheus"},
"definition": "label_values(kube_cronjob_info, cluster_environment)",
"includeAll": true,
"multi": true,
"name": "cluster_environment",
"label": "Environment",
"query": "label_values(kube_cronjob_info, cluster_environment)",
"refresh": 2,
"sort": 1,
"type": "query"
}
]
},
"time": {"from": "now-24h", "to": "now"},
"title": "CronJob & Backup Monitoring",
"uid": "edp-cronjobs"
}

View file

@ -0,0 +1,207 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: forgejo
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Applications"
json: |
{
"annotations": {"list": []},
"editable": true,
"graphTooltip": 1,
"panels": [
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
"title": "Forgejo Health",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"mappings": [{"options": {"0": {"text": "DOWN", "color": "red"}, "1": {"text": "UP", "color": "green"}}, "type": "value"}], "thresholds": {"mode": "absolute", "steps": [{"color": "red", "value": null}, {"color": "green", "value": 1}]}}},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 1},
"title": "Status",
"type": "stat",
"targets": [{"expr": "up{job=\"forgejo-server-http\", cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 1},
"title": "Version",
"type": "stat",
"targets": [{"expr": "gitea_build_info{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{version}}"}],
"options": {"reduceOptions": {"calcs": ["lastNotNull"]}, "textMode": "name"}
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 1},
"title": "Repositories",
"type": "stat",
"targets": [{"expr": "gitea_repositories{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 1},
"title": "Users",
"type": "stat",
"targets": [{"expr": "gitea_users{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 1},
"title": "Organizations",
"type": "stat",
"targets": [{"expr": "gitea_organizations{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 1},
"title": "Teams",
"type": "stat",
"targets": [{"expr": "gitea_teams{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 5},
"title": "Activity",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 6},
"title": "Open Issues",
"type": "stat",
"targets": [{"expr": "gitea_issues_open{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 6},
"title": "Closed Issues",
"type": "stat",
"targets": [{"expr": "gitea_issues_closed{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 6},
"title": "Webhooks",
"type": "stat",
"targets": [{"expr": "gitea_webhooks{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 6, "x": 18, "y": 6},
"title": "Hook Tasks",
"type": "stat",
"targets": [{"expr": "gitea_hooktasks{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 10},
"title": "Content & Auth",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 11},
"title": "Stars",
"type": "stat",
"targets": [{"expr": "gitea_stars{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 11},
"title": "Watches",
"type": "stat",
"targets": [{"expr": "gitea_watches{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 11},
"title": "Releases",
"type": "stat",
"targets": [{"expr": "gitea_releases{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 11},
"title": "Mirrors",
"type": "stat",
"targets": [{"expr": "gitea_mirrors{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 11},
"title": "Public Keys",
"type": "stat",
"targets": [{"expr": "gitea_publickeys{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 11},
"title": "OAuth Apps",
"type": "stat",
"targets": [{"expr": "gitea_oauths{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 15},
"title": "Forgejo Logs",
"type": "row"
},
{
"datasource": {"type": "victoriametrics-logs-datasource"},
"gridPos": {"h": 10, "w": 12, "x": 0, "y": 16},
"title": "Forgejo Server Logs",
"type": "logs",
"targets": [{"expr": "{cluster_environment=~\"$cluster_environment\", kubernetes.namespace=\"gitea\"}", "refId": "A"}],
"options": {"showTime": true, "showLabels": true, "wrapLogMessage": true, "enableLogDetails": true, "sortOrder": "Descending"}
},
{
"datasource": {"type": "victoriametrics-logs-datasource"},
"gridPos": {"h": 10, "w": 12, "x": 12, "y": 16},
"title": "Forgejo Errors",
"type": "logs",
"targets": [{"expr": "{cluster_environment=~\"$cluster_environment\", kubernetes.namespace=\"gitea\"} error OR Error OR ERROR OR panic", "refId": "A"}],
"options": {"showTime": true, "showLabels": true, "wrapLogMessage": true, "enableLogDetails": true, "sortOrder": "Descending"}
}
],
"schemaVersion": 39,
"tags": ["edp", "forgejo", "gitea"],
"templating": {
"list": [
{
"current": {"selected": true, "text": "All", "value": "$__all"},
"datasource": {"type": "prometheus"},
"definition": "label_values(gitea_repositories, cluster_environment)",
"includeAll": true,
"multi": true,
"name": "cluster_environment",
"label": "Environment",
"query": "label_values(gitea_repositories, cluster_environment)",
"refresh": 2,
"type": "query"
}
]
},
"time": {"from": "now-6h", "to": "now"},
"title": "Forgejo",
"uid": "edp-forgejo"
}

View file

@ -0,0 +1,117 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: garm
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Applications"
json: |
{
"annotations": {"list": []},
"editable": true,
"graphTooltip": 1,
"panels": [
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
"title": "GARM Runner Status",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}}},
"gridPos": {"h": 5, "w": 6, "x": 0, "y": 1},
"title": "Total Runners",
"type": "stat",
"targets": [{"expr": "count(garm_runner_status{cluster_environment=~\"$cluster_environment\"}) or vector(0)", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}]}}},
"gridPos": {"h": 5, "w": 6, "x": 6, "y": 1},
"title": "Idle Runners",
"type": "stat",
"targets": [{"expr": "count(garm_runner_status{cluster_environment=~\"$cluster_environment\", status=\"idle\"}) or vector(0)", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "yellow", "value": null}]}}},
"gridPos": {"h": 5, "w": 6, "x": 12, "y": 1},
"title": "Creating",
"type": "stat",
"targets": [{"expr": "count(garm_runner_status{cluster_environment=~\"$cluster_environment\", status=\"creating\"}) or vector(0)", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "red", "value": null}]}}},
"gridPos": {"h": 5, "w": 6, "x": 18, "y": 1},
"title": "Errors",
"type": "stat",
"targets": [{"expr": "sum(rate(garm_runner_errors_total{cluster_environment=~\"$cluster_environment\"}[5m])) or vector(0)", "legendFormat": ""}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 6},
"title": "GitHub API Rate Limits",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "min": 0}},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 7},
"title": "Rate Limit Remaining",
"type": "timeseries",
"targets": [{"expr": "garm_github_rate_limit_remaining{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "ops"}},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 7},
"title": "Runner Operations Rate",
"type": "timeseries",
"targets": [{"expr": "sum(rate(garm_runner_operations_total{cluster_environment=~\"$cluster_environment\"}[5m])) by (cluster_environment)", "legendFormat": "{{cluster_environment}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 15},
"title": "Runner Details",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"custom": {"filterable": true}}},
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 16},
"title": "Runner Pool Status",
"type": "table",
"targets": [{"expr": "garm_runner_status{cluster_environment=~\"$cluster_environment\"}", "format": "table", "instant": true}],
"transformations": [
{"id": "filterFieldsByName", "options": {"include": {"names": ["cluster_environment", "name", "status", "pool_owner", "pool_type", "provider"]}}},
{"id": "organize", "options": {"renameByName": {"cluster_environment": "Environment", "name": "Runner", "status": "Status", "pool_owner": "Pool Owner", "pool_type": "Type", "provider": "Provider"}}}
]
}
],
"schemaVersion": 39,
"tags": ["edp", "garm", "ci-cd", "runners"],
"templating": {
"list": [
{
"current": {"selected": true, "text": "All", "value": "$__all"},
"datasource": {"type": "prometheus"},
"definition": "label_values(garm_runner_status, cluster_environment)",
"includeAll": true,
"multi": true,
"name": "cluster_environment",
"label": "Environment",
"query": "label_values(garm_runner_status, cluster_environment)",
"refresh": 2,
"sort": 1,
"type": "query"
}
]
},
"time": {"from": "now-6h", "to": "now"},
"title": "GARM Runners",
"uid": "edp-garm"
}

View file

@ -8,7 +8,8 @@ spec:
persistentVolumeClaim:
metadata:
annotations:
everest.io/disk-volume-type: SATA
everest.io/disk-volume-type: GPSSD
everest.io/crypt-key-id: {{{ .Env.PVC_KMS_KEY_ID }}}
spec:
storageClassName: csi-disk
accessModes:
@ -16,6 +17,40 @@ spec:
resources:
requests:
storage: 10Gi
deployment:
spec:
template:
spec:
containers:
- name: grafana
env:
- name: OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: clientSecret
name: dex-grafana-client
config:
log.console:
level: debug
server:
root_url: "https://{{{ .Env.DOMAIN_GRAFANA }}}"
auth:
disable_login: "true"
disable_login_form: "true"
auth.generic_oauth:
enabled: "true"
name: Forgejo
allow_sign_up: "true"
use_refresh_token: "true"
client_id: grafana
client_secret: $__env{OAUTH_CLIENT_SECRET}
scopes: openid email profile offline_access groups
auth_url: https://{{{ .Env.DOMAIN_DEX }}}/auth
token_url: https://{{{ .Env.DOMAIN_DEX }}}/token
api_url: https://{{{ .Env.DOMAIN_DEX }}}/userinfo
redirect_uri: https://{{{ .Env.DOMAIN_GRAFANA }}}/login/generic_oauth
role_attribute_path: "contains(groups[*], 'DevFW') && 'GrafanaAdmin' || 'None'"
allow_assign_grafana_admin: "true"
ingress:
metadata:
annotations:
@ -24,7 +59,7 @@ spec:
spec:
ingressClassName: nginx
rules:
- host: grafana.{{{ .Env.DOMAIN }}}
- host: {{{ .Env.DOMAIN_GRAFANA }}}
http:
paths:
- backend:
@ -36,5 +71,5 @@ spec:
pathType: Prefix
tls:
- hosts:
- grafana.{{{ .Env.DOMAIN }}}
- {{{ .Env.DOMAIN_GRAFANA }}}
secretName: grafana-net-tls

View file

@ -6,4 +6,5 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Operations"
url: "https://raw.githubusercontent.com/adinhodovic/ingress-nginx-mixin/refs/heads/main/dashboards_out/ingress-nginx-overview.json"

View file

@ -0,0 +1,245 @@
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: platform-overview
spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
folder: "EDP / Overview"
json: |
{
"annotations": {"list": []},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"links": [],
"panels": [
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
"title": "Platform Health",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {
"mappings": [{"options": {"0": {"text": "DOWN", "color": "red"}, "1": {"text": "UP", "color": "green"}}, "type": "value"}],
"thresholds": {"mode": "absolute", "steps": [{"color": "red", "value": null}, {"color": "green", "value": 1}]}
}
},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 1},
"title": "Forgejo",
"type": "stat",
"targets": [{"expr": "sum(up{job=\"forgejo-server-http\", cluster_environment=~\"$cluster_environment\"})", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 1}, {"color": "red", "value": 3}]}
}
},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 1},
"title": "Ingress 5xx (5m)",
"type": "stat",
"targets": [{"expr": "sum(rate(nginx_ingress_controller_requests{status=~\"5..\", cluster_environment=~\"$cluster_environment\"}[5m]))", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {
"unit": "short",
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "red", "value": 1}]}
}
},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 1},
"title": "Failed Jobs (24h)",
"type": "stat",
"targets": [{"expr": "sum(kube_job_status_failed{namespace=\"gitea\", cluster_environment=~\"$cluster_environment\"}) or vector(0)", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 0.7}, {"color": "red", "value": 0.85}]}
}
},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 1},
"title": "Cluster CPU Usage",
"type": "stat",
"targets": [{"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster_environment=~\"$cluster_environment\"}[5m]))", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 0.7}, {"color": "red", "value": 0.85}]}
}
},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 1},
"title": "Cluster Memory Usage",
"type": "stat",
"targets": [{"expr": "1 - sum(node_memory_MemAvailable_bytes{cluster_environment=~\"$cluster_environment\"}) / sum(node_memory_MemTotal_bytes{cluster_environment=~\"$cluster_environment\"})", "legendFormat": ""}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 0.6}, {"color": "red", "value": 0.8}]}
}
},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 1},
"title": "Max PVC Usage",
"type": "stat",
"targets": [{"expr": "max(1 - kubelet_volume_stats_available_bytes{cluster_environment=~\"$cluster_environment\"} / kubelet_volume_stats_capacity_bytes{cluster_environment=~\"$cluster_environment\"})", "legendFormat": ""}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 5},
"title": "Forgejo",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 0, "y": 6},
"title": "Repositories",
"type": "stat",
"targets": [{"expr": "gitea_repositories{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 4, "y": 6},
"title": "Users",
"type": "stat",
"targets": [{"expr": "gitea_users{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 8, "y": 6},
"title": "Organizations",
"type": "stat",
"targets": [{"expr": "gitea_organizations{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 12, "y": 6},
"title": "Open Issues",
"type": "stat",
"targets": [{"expr": "gitea_issues_open{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 16, "y": 6},
"title": "Webhooks",
"type": "stat",
"targets": [{"expr": "gitea_webhooks{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short"}},
"gridPos": {"h": 4, "w": 4, "x": 20, "y": 6},
"title": "Mirrors",
"type": "stat",
"targets": [{"expr": "gitea_mirrors{cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cluster_environment}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 10},
"title": "Resources",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "percentunit", "min": 0, "max": 1}},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 11},
"title": "Node CPU Usage",
"type": "timeseries",
"targets": [{"expr": "1 - rate(node_cpu_seconds_total{mode=\"idle\", cluster_environment=~\"$cluster_environment\"}[5m])", "legendFormat": "{{instance}}"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "percentunit", "min": 0, "max": 1}},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 11},
"title": "PVC Usage by Claim",
"type": "timeseries",
"targets": [{"expr": "1 - (kubelet_volume_stats_available_bytes{cluster_environment=~\"$cluster_environment\"} / kubelet_volume_stats_capacity_bytes{cluster_environment=~\"$cluster_environment\"})", "legendFormat": "{{namespace}}/{{persistentvolumeclaim}}"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 19},
"title": "Backups",
"type": "row"
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "s", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "yellow", "value": 86400}, {"color": "red", "value": 172800}]}}},
"gridPos": {"h": 4, "w": 8, "x": 0, "y": 20},
"title": "Time Since Last Backup Schedule",
"type": "stat",
"targets": [{"expr": "time() - kube_cronjob_status_last_schedule_time{cronjob=~\"forgejo-s3-backup|secrets-backup\", cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{cronjob}} ({{cluster_environment}})"}]
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "s"}},
"gridPos": {"h": 4, "w": 8, "x": 8, "y": 20},
"title": "Backup Job Duration (Last 7d)",
"type": "timeseries",
"targets": [{"expr": "kube_job_status_completion_time{job_name=~\"forgejo-s3-backup.*|secrets-backup.*\", cluster_environment=~\"$cluster_environment\"} - kube_job_status_start_time{job_name=~\"forgejo-s3-backup.*|secrets-backup.*\", cluster_environment=~\"$cluster_environment\"}", "legendFormat": "{{job_name}}"}],
"options": {"legend": {"displayMode": "table"}}
},
{
"datasource": {"type": "prometheus"},
"fieldConfig": {"defaults": {"unit": "short", "thresholds": {"mode": "absolute", "steps": [{"color": "green", "value": null}, {"color": "red", "value": 1}]}}},
"gridPos": {"h": 4, "w": 8, "x": 16, "y": 20},
"title": "Failed Backup Jobs (Active)",
"type": "stat",
"targets": [{"expr": "sum by(cluster_environment, job_name) (kube_job_status_failed{job_name=~\"forgejo-s3-backup.*|secrets-backup.*\", cluster_environment=~\"$cluster_environment\"})", "legendFormat": "{{job_name}} ({{cluster_environment}})"}]
},
{
"collapsed": false,
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 24},
"title": "Logs",
"type": "row"
},
{
"datasource": {"type": "victoriametrics-logs-datasource"},
"gridPos": {"h": 10, "w": 24, "x": 0, "y": 25},
"title": "Recent Errors (all namespaces)",
"type": "logs",
"targets": [{"expr": "{cluster_environment=~\"$cluster_environment\"} error OR Error OR ERROR OR panic OR PANIC", "refId": "A"}],
"options": {"showTime": true, "showLabels": true, "showCommonLabels": false, "wrapLogMessage": true, "prettifyLogMessage": false, "enableLogDetails": true, "sortOrder": "Descending", "dedupStrategy": "none"}
}
],
"schemaVersion": 39,
"tags": ["edp", "platform", "overview"],
"templating": {
"list": [
{
"current": {"selected": true, "text": "All", "value": "$__all"},
"datasource": {"type": "prometheus"},
"definition": "label_values(up, cluster_environment)",
"includeAll": true,
"multi": true,
"name": "cluster_environment",
"label": "Environment",
"query": "label_values(up, cluster_environment)",
"refresh": 2,
"sort": 1,
"type": "query"
}
]
},
"time": {"from": "now-6h", "to": "now"},
"title": "EDP Platform Overview",
"uid": "edp-platform-overview"
}

View file

@ -6,4 +6,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/VictoriaMetrics/VictoriaMetrics/refs/heads/master/dashboards/vm/victorialogs.json"
folder: "EDP / Operations"
grafanaCom:
id: 22698
revision: 1

View file

@ -1,18 +1,119 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: forgejo-alerts
name: edp-platform-alerts
namespace: observability
spec:
groups:
- name: forgejo
- name: platform-health
rules:
- alert: forgejo down
expr: sum by(cluster_environment) (up{pod=~"forgejo-server-.*"}) < 1
for: 30s
- alert: ForgejoDown
expr: sum by(cluster_environment) (up{job="forgejo-server-http"}) < 1
for: 1m
labels:
severity: critical
annotations:
summary: "Forgejo is down on {{ $labels.cluster_environment }}"
description: "Forgejo server has been unreachable for more than 1 minute in cluster {{ $labels.cluster_environment }}."
- alert: IngressHighErrorRate
expr: |
sum by(cluster_environment) (rate(nginx_ingress_controller_requests{status=~"5.."}[5m]))
/ sum by(cluster_environment) (rate(nginx_ingress_controller_requests[5m])) > 0.05
for: 5m
labels:
severity: major
job: "{{ $labels.job }}"
annotations:
value: "{{ $value }}"
description: 'forgejo is down in cluster environment {{ $labels.cluster_environment }}'
summary: "High ingress 5xx rate on {{ $labels.cluster_environment }}"
description: "More than 5% of ingress requests are returning 5xx errors for over 5 minutes."
value: "{{ $value | humanizePercentage }}"
- alert: NodeNotReady
expr: kube_node_status_condition{condition="Ready", status="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} not ready on {{ $labels.cluster_environment }}"
description: "Node {{ $labels.node }} has been in NotReady state for more than 5 minutes."
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 3
for: 5m
labels:
severity: major
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} crash-looping on {{ $labels.cluster_environment }}"
description: "Pod has restarted more than 3 times in the last 15 minutes."
- name: storage
rules:
- alert: PVCUsageHigh
expr: |
1 - (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) > 0.80
for: 5m
labels:
severity: major
annotations:
summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} usage >80%"
description: "PVC usage is at {{ $value | humanizePercentage }} on {{ $labels.cluster_environment }}."
value: "{{ $value | humanizePercentage }}"
- alert: PVCUsageCritical
expr: |
1 - (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) > 0.90
for: 5m
labels:
severity: critical
annotations:
summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} usage >90%"
description: "PVC is almost full at {{ $value | humanizePercentage }} on {{ $labels.cluster_environment }}. Immediate action required."
value: "{{ $value | humanizePercentage }}"
- name: resources
rules:
- alert: NodeCPUHigh
expr: |
1 - avg by(instance, cluster_environment) (rate(node_cpu_seconds_total{mode="idle"}[5m])) > 0.85
for: 15m
labels:
severity: major
annotations:
summary: "Node {{ $labels.instance }} CPU >85% on {{ $labels.cluster_environment }}"
description: "Node CPU utilization has been above 85% for 15 minutes."
value: "{{ $value | humanizePercentage }}"
- alert: NodeMemoryHigh
expr: |
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.90
for: 10m
labels:
severity: major
annotations:
summary: "Node memory >90% on {{ $labels.cluster_environment }}"
description: "Node memory utilization above 90% for 10 minutes."
value: "{{ $value | humanizePercentage }}"
- name: cluster-health
rules:
- alert: ClusterMetricsSilent
expr: |
count(up{job="kubelet"}) by (cluster_environment) < 1
or
absent(up{job="kubelet", cluster_environment="dev"})
for: 10m
labels:
severity: critical
annotations:
summary: "Cluster {{ $labels.cluster_environment }} stopped sending metrics"
description: "No kubelet metrics received from cluster {{ $labels.cluster_environment }} for over 10 minutes. Either vmagent is dead or the cluster is unreachable."
- alert: ClusterAPIServerDown
expr: |
up{job="apiserver", cluster_environment=~".+"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "API server down on {{ $labels.cluster_environment }}"
description: "Kubernetes API server scrape is failing on cluster {{ $labels.cluster_environment }}."

View file

@ -0,0 +1,13 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: argocd
spec:
namespaceSelector:
matchNames:
- argocd
selector:
matchLabels:
app.kubernetes.io/part-of: argocd
endpoints:
- port: http-metrics

View file

@ -0,0 +1,78 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: backup-alerts
namespace: observability
spec:
groups:
- name: backup-schedule-staleness
rules:
- alert: BackupCronJobNotScheduled
expr: |
time() - kube_cronjob_status_last_schedule_time{cronjob=~"forgejo-s3-backup|secrets-backup", namespace="gitea"}
> 26 * 3600
for: 5m
labels:
severity: critical
cronjob: "{{ $labels.cronjob }}"
annotations:
value: "{{ $value | humanizeDuration }}"
description: >-
CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} has not been
scheduled for over 26 hours in cluster {{ $labels.cluster_environment }}.
Last schedule was {{ $value | humanizeDuration }} ago.
summary: "Backup CronJob {{ $labels.cronjob }} is stale"
- alert: BackupCronJobNeverScheduled
expr: |
kube_cronjob_status_last_schedule_time{cronjob=~"forgejo-s3-backup|secrets-backup", namespace="gitea"}
== 0
for: 30m
labels:
severity: critical
cronjob: "{{ $labels.cronjob }}"
annotations:
description: >-
CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} has never been
scheduled in cluster {{ $labels.cluster_environment }}.
summary: "Backup CronJob {{ $labels.cronjob }} never ran"
- name: backup-job-failures
rules:
- alert: BackupJobFailed
expr: |
max by(cluster_environment, namespace, job_name) (
kube_job_status_failed{job_name=~"forgejo-s3-backup-.*|secrets-backup-.*", namespace="gitea"}
) > 0
for: 30s
labels:
severity: critical
job_name: "{{ $labels.job_name }}"
annotations:
value: "{{ $value }}"
description: >-
Backup job {{ $labels.namespace }}/{{ $labels.job_name }} has
{{ $value }} failed pod(s) in cluster {{ $labels.cluster_environment }}.
summary: "Backup job {{ $labels.job_name }} failed"
- name: backup-job-duration
rules:
- alert: BackupJobTooSlow
expr: |
(
time() - kube_job_status_start_time{job_name=~"forgejo-s3-backup-.*|secrets-backup-.*", namespace="gitea"}
) > 300
and
kube_job_status_active{job_name=~"forgejo-s3-backup-.*|secrets-backup-.*", namespace="gitea"} > 0
for: 1m
labels:
severity: major
job_name: "{{ $labels.job_name }}"
annotations:
value: "{{ $value | humanizeDuration }}"
description: >-
Backup job {{ $labels.namespace }}/{{ $labels.job_name }} has been
running for {{ $value | humanizeDuration }} (threshold: 5m)
in cluster {{ $labels.cluster_environment }}. This may indicate a
hung process or connectivity issue.
summary: "Backup job {{ $labels.job_name }} running too long"

View file

@ -0,0 +1,61 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: ci-sustainability
spec:
groups:
- name: ci.sustainability.daily
interval: 5m
rules:
- record: ci:cpu_seconds:increase1d
expr: |
sum by(namespace, cluster) (
increase(container_cpu_usage_seconds_total{
namespace=~"gitea|garm",
pod=~"forgejo-runner.*|garm-.*",
container!=""
}[1d])
)
- record: ci:memory_bytes_seconds:avg1d
expr: |
avg_over_time(
sum by(namespace, cluster) (
container_memory_working_set_bytes{
namespace=~"gitea|garm",
pod=~"forgejo-runner.*|garm-.*",
container!=""
}
)[1d:5m]
)
- record: ci:pod_count:avg1d
expr: |
avg_over_time(
count by(namespace, cluster) (
kube_pod_info{
namespace=~"gitea|garm",
pod=~"forgejo-runner.*|garm-.*"
}
)[1d:5m]
)
- record: ci:pod_creations:increase1d
expr: |
sum by(namespace, cluster) (
changes(kube_pod_start_time{
namespace=~"gitea|garm",
pod=~"forgejo-runner.*|garm-.*"
}[1d])
)
- name: ci.sustainability.cluster
interval: 5m
rules:
- record: cluster:cpu_seconds:rate5m
expr: |
sum by(cluster) (
rate(node_cpu_seconds_total{mode!="idle"}[5m])
)
- record: cluster:memory_used_bytes:sum
expr: |
sum by(cluster) (
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
)

View file

@ -0,0 +1,30 @@
apiVersion: v1
kind: Service
metadata:
name: coredns-metrics
namespace: kube-system
labels:
k8s-app: coredns-metrics
spec:
clusterIP: None
selector:
k8s-app: coredns
ports:
- name: metrics
port: 9153
targetPort: 9153
protocol: TCP
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: coredns
spec:
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: coredns-metrics
endpoints:
- port: metrics

View file

@ -0,0 +1,13 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: garm
spec:
namespaceSelector:
matchNames:
- garm
selector:
matchLabels:
app.kubernetes.io/name: garm
endpoints:
- port: http

View file

@ -9,7 +9,7 @@ spec:
storageMetadata:
annotations:
everest.io/crypt-key-id: {{{ .Env.PVC_KMS_KEY_ID }}}
everest.io/disk-volume-type: SATA
everest.io/disk-volume-type: GPSSD
storage:
storageClassName: csi-disk
accessModes:

View file

@ -5,11 +5,17 @@ metadata:
namespace: observability
spec:
username: simple-user
password: simple-password
password: sx5gC7ooWaWOODwD
targetRefs:
- static:
url: http://vmsingle-o12y:8429
paths: ["/api/v1/write"]
- static:
url: http://vmsingle-o12y:8429
paths: ["/api/v1/.*"]
- static:
url: http://vlogs-victorialogs:9428
paths: ["/insert/elasticsearch/.*"]
- static:
url: http://vlogs-victorialogs:9428
paths: ["/select/.*"]

View file

@ -1,6 +1,6 @@
global:
# -- Cluster label to use for dashboards and rules
clusterLabel: cluster
clusterLabel: cluster_environment
# -- Global license configuration
license:
key: ""
@ -201,13 +201,13 @@ defaultRules:
create: true
rules: {}
kubernetesSystemControllerManager:
create: true
create: false
rules: {}
kubeScheduler:
create: true
create: false
rules: {}
kubernetesSystemScheduler:
create: true
create: false
rules: {}
kubeStateMetrics:
create: true
@ -289,7 +289,7 @@ vmsingle:
storageMetadata:
annotations:
everest.io/crypt-key-id: {{{ .Env.PVC_KMS_KEY_ID }}}
everest.io/disk-volume-type: SATA
everest.io/disk-volume-type: GPSSD
storage:
storageClassName: csi-disk
accessModes:
@ -538,108 +538,30 @@ alertmanager:
# If you're migrating existing config, please make sure that `.Values.alertmanager.config`:
# - with `useManagedConfig: false` has structure described [here](https://prometheus.io/docs/alerting/latest/configuration/).
# - with `useManagedConfig: true` has structure described [here](https://docs.victoriametrics.com/operator/api/#vmalertmanagerconfig).
useManagedConfig: false
useManagedConfig: true
# -- (object) Alertmanager configuration
config:
route:
receiver: "blackhole"
# group_by: ["alertgroup", "job"]
# group_wait: 30s
# group_interval: 5m
# repeat_interval: 12h
# routes:
#
# # Duplicate code_owner routes to teams
# # These will send alerts to team channels but continue
# # processing through the rest of the tree to handled by on-call
# - matchers:
# - code_owner_channel!=""
# - severity=~"info|warning|critical"
# group_by: ["code_owner_channel", "alertgroup", "job"]
# receiver: slack-code-owners
#
# # Standard on-call routes
# - matchers:
# - severity=~"info|warning|critical"
# receiver: slack-monitoring
# continue: true
#
# inhibit_rules:
# - target_matchers:
# - severity=~"warning|info"
# source_matchers:
# - severity=critical
# equal:
# - cluster
# - namespace
# - alertname
# - target_matchers:
# - severity=info
# source_matchers:
# - severity=warning
# equal:
# - cluster
# - namespace
# - alertname
# - target_matchers:
# - severity=info
# source_matchers:
# - alertname=InfoInhibitor
# equal:
# - cluster
# - namespace
routes:
- matchers:
- severity=~"critical|major"
receiver: outlook
receivers:
- name: blackhole
# - name: "slack-monitoring"
# slack_configs:
# - channel: "#channel"
# send_resolved: true
# title: '{{ template "slack.monzo.title" . }}'
# icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}'
# color: '{{ template "slack.monzo.color" . }}'
# text: '{{ template "slack.monzo.text" . }}'
# actions:
# - type: button
# text: "Runbook :green_book:"
# url: "{{ (index .Alerts 0).Annotations.runbook_url }}"
# - type: button
# text: "Query :mag:"
# url: "{{ (index .Alerts 0).GeneratorURL }}"
# - type: button
# text: "Dashboard :grafana:"
# url: "{{ (index .Alerts 0).Annotations.dashboard }}"
# - type: button
# text: "Silence :no_bell:"
# url: '{{ template "__alert_silence_link" . }}'
# - type: button
# text: '{{ template "slack.monzo.link_button_text" . }}'
# url: "{{ .CommonAnnotations.link_url }}"
# - name: slack-code-owners
# slack_configs:
# - channel: "#{{ .CommonLabels.code_owner_channel }}"
# send_resolved: true
# title: '{{ template "slack.monzo.title" . }}'
# icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}'
# color: '{{ template "slack.monzo.color" . }}'
# text: '{{ template "slack.monzo.text" . }}'
# actions:
# - type: button
# text: "Runbook :green_book:"
# url: "{{ (index .Alerts 0).Annotations.runbook }}"
# - type: button
# text: "Query :mag:"
# url: "{{ (index .Alerts 0).GeneratorURL }}"
# - type: button
# text: "Dashboard :grafana:"
# url: "{{ (index .Alerts 0).Annotations.dashboard }}"
# - type: button
# text: "Silence :no_bell:"
# url: '{{ template "__alert_silence_link" . }}'
# - type: button
# text: '{{ template "slack.monzo.link_button_text" . }}'
# url: "{{ .CommonAnnotations.link_url }}"
#
- name: outlook
email_configs:
- smarthost: 'mail.mms-support.de:465'
auth_username: 'ipcei-cis-devfw@mms-support.de'
auth_password:
name: email-user-credentials
key: connection-string
from: '"IPCEI CIS DevFW" <ipcei-cis-devfw@mms-support.de>'
to: 'f9f9953a.mg.telekom.de@de.teams.ms'
headers:
subject: 'Grafana Mail Alerts'
require_tls: false
# -- Better alert templates for [slack source](https://gist.github.com/milesbxf/e2744fc90e9c41b47aa47925f8ff6512)
monzoTemplate:
enabled: true
@ -1098,7 +1020,7 @@ kubeApiServer:
# Component scraping the kube controller manager
kubeControllerManager:
# -- Enable kube controller manager metrics scraping
enabled: true
enabled: false
# -- If your kube controller manager is not deployed as a pod, specify IPs it can be found on
endpoints: []
@ -1231,7 +1153,7 @@ kubeEtcd:
# Component scraping kube scheduler
kubeScheduler:
# -- Enable KubeScheduler metrics scraping
enabled: true
enabled: false
# -- If your kube scheduler is not deployed as a pod, specify IPs it can be found on
endpoints: []

View file

@ -18,9 +18,9 @@ spec:
name: in-cluster
namespace: ingress-nginx
sources:
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/DevFW-CICD/ingress-nginx-helm.git
- repoURL: https://github.com/kubernetes/ingress-nginx.git
path: charts/ingress-nginx
targetRevision: helm-chart-4.12.4-depends
targetRevision: helm-chart-4.12.1
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/otc/ingress-nginx/values.yaml

View file

@ -13,6 +13,6 @@ parameters:
kubernetes.io/volumetype: SATA
kubernetes.io/zone: eu-de-02
provisioner: flexvolume-huawei.com/fuxivol
reclaimPolicy: Delete
reclaimPolicy: {{{ getenv "STORAGE_RECLAIM_POLICY" "Retain" }}}
volumeBindingMode: Immediate
allowVolumeExpansion: true

View file

@ -0,0 +1,30 @@
# helm upgrade --install --create-namespace --namespace terralist terralist oci://ghcr.io/terralist/helm-charts/terralist -f terralist-values.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: terralist
namespace: argocd
labels:
env: dev
spec:
project: default
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: -1
destination:
name: in-cluster
namespace: terralist
sources:
- repoURL: https://github.com/terralist/helm-charts
path: charts/terralist
targetRevision: terralist-0.8.1
helm:
valueFiles:
- $values/{{{ .Env.CLIENT_REPO_ID }}}/{{{ .Env.DOMAIN }}}/stacks/terralist/terralist/values.yaml
- repoURL: https://{{{ .Env.CLIENT_REPO_DOMAIN }}}/{{{ .Env.CLIENT_REPO_ORG_NAME }}}
targetRevision: HEAD
ref: values

View file

@ -0,0 +1,87 @@
controllers:
main:
strategy: Recreate
containers:
app:
env:
- name: TERRALIST_OAUTH_PROVIDER
value: oidc
- name: TERRALIST_OI_CLIENT_ID
valueFrom:
secretKeyRef:
name: oidc-credentials
key: client-id
- name: TERRALIST_OI_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: oidc-credentials
key: client-secret
- name: TERRALIST_OI_AUTHORIZE_URL
valueFrom:
secretKeyRef:
name: oidc-credentials
key: authorize-url
- name: TERRALIST_OI_TOKEN_URL
valueFrom:
secretKeyRef:
name: oidc-credentials
key: token-url
- name: TERRALIST_OI_USERINFO_URL
valueFrom:
secretKeyRef:
name: oidc-credentials
key: userinfo-url
- name: TERRALIST_OI_SCOPE
valueFrom:
secretKeyRef:
name: oidc-credentials
key: scope
- name: TERRALIST_TOKEN_SIGNING_SECRET
valueFrom:
secretKeyRef:
name: terralist-secret
key: token-signing-secret
- name: TERRALIST_COOKIE_SECRET
valueFrom:
secretKeyRef:
name: terralist-secret
key: cookie-secret
- name: TERRALIST_URL
value: https://terralist.{{{ .Env.DOMAIN_GITEA }}}
- name: TERRALIST_SQLITE_PATH
value: /data/db.sqlite
- name: TERRALIST_LOCAL_STORE
value: /data/modules
- name: TERRALIST_PROVIDERS_ANONYMOUS_READ
value: "true"
ingress:
main:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: main
hosts:
- host: terralist.{{{ .Env.DOMAIN_GITEA }}}
paths:
- path: /
pathType: Prefix
service:
identifier: main
port: http
tls:
- hosts:
- terralist.{{{ .Env.DOMAIN_GITEA }}}
secretName: terralist-tls-secret
persistence:
data:
enabled: true
accessMode: ReadWriteOnce
size: 10Gi
retain: false
storageClass: "csi-disk"
annotations:
everest.io/disk-volume-type: GPSSD
globalMounts:
- path: /data