GitOps Schema — it-ops.yaml reference
One required field, everything else opt-in. The agent discovers ConfigMaps labelled itops.io/config: "true" and reads the it-ops.yaml data key.
Minimum
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-itops
labels:
itops.io/config: "true"
data:
it-ops.yaml: |
path: "myorg/myplatform/prod/cluster1/my-app"
One line. The 5-level path (organization/platform/environment/cluster/service) uniquely identifies the service across every cluster connected to the platform. First 4 segments become the hierarchy node, the 5th becomes the service name.
Progressive disclosure
Add fields to unlock features. Every entry below is optional.
Presentation
displayName: "My App" # default: titlecased last path segment
description: "Payment API"
type: api # free-form category (api, database, cache, storage, ...)
tags: [api, payment] # storage/database/cache/s3/ebs/rds tags route to Storage tab
team: backend # stored under ownership.team
SLA tracking
slaGroup: "payment-system" # membership in a cross-cluster SLA group
criticality: critical # critical | high | medium | low
# Auto-assigns a default SLA tier on first sight.
Workload override (K8s)
workload:
type: statefulset # deployment (default) | statefulset | daemonset
name: my-release-my-app # default: last path segment
# Set when Helm release prefix differs from service name.
Backup monitoring
backup:
expected: true
maxAgeDays: 1
storageSize: "100Gi" # informational
schedule: "0 2 * * *" # informational (display only)
retention: "30d"
When expected: true, the service appears on the Backup tab and SLA alerts fire if no /api/v1/backup/report push arrives within maxAgeDays.
HTTP health probe
health:
enabled: true
port: 8080 # default 80
path: "/healthz" # default /healthz
interval: "30s" # default 30s
timeout: "5s" # default 5s
# Or absolute URL override:
# endpoint: "http://some-external-host:9000/healthz"
The agent probes the URL in-cluster at interval. A 2xx response upgrades the service to OPERATIONAL (overrides K8s-native status). Non-2xx or timeout marks it DOWN with a named reason (HealthProbeFailed, HealthProbeTimeout). The service detail dialog shows an "active" badge with the probe target.
Dependencies
Every reference is a single path: — globally unique, clickable in the UI, never ambiguous.
dependencies:
requires:
- path: "myorg/myplatform/prod/cluster1/postgresql"
type: storage # optional classification
critical: true # default false
- path: "myorg/myplatform/prod/cluster1/redis"
type: cache
usedBy:
- path: "myorg/myplatform/prod/cluster1/web-frontend"
type: frontend
requires and usedBy are symmetric — either side can declare the edge; the platform renders both chips on both services' detail pages.
Links
links:
runbook: "https://wiki.corp/runbooks/my-app"
dashboard: "https://grafana.corp/d/my-app"
documentation: "https://docs.corp/my-app"
repository: "https://github.com/mycorp/my-app"
logs: "https://elastic.corp/app/my-app"
alerts: "https://alertmanager.corp/#/my-app"
api: "https://api.corp/my-app"
custom: "https://whatever.corp/my-app"
Contacts
contacts:
owner: "alice@example.com"
slack: "#backend-oncall"
contact: "+36-30-1234567"
escalation: "cto@example.com"
Merged into the service's ownership block alongside team.
Relations (parent/children graph)
relations:
parent: "retail-stack-umbrella"
children:
- sub-component-a
- sub-component-b
Free-form logical hierarchy beyond the cluster/namespace tree. Rendered under "Relations" on the service detail.
Monitoring hints
monitoring:
enabled: true
prometheusJob: "my-app"
alerts:
- name: "HighErrorRate"
severity: warning
expression: 'rate(http_errors[5m]) > 0.01'
Informational — the platform doesn't evaluate rules, just surfaces them in the service detail.
Custom key/value bag
custom:
costCenter: "CC-42"
dataClass: "pii"
complianceTier: "gdpr-restricted"
Free-form, passthrough, no validation. Rendered as a key/value table on the service detail. Use this for organization-specific annotations without schema changes.
Applying it
Commit the ConfigMap to Git next to the service's Helm chart. Apply via ArgoCD (recommended) or kubectl apply. The agent rediscovers every 30 seconds and pushes changes to the platform automatically.
Mixing kubectl apply with an ArgoCD-managed path causes drift. Pick one.
Parse errors
If the YAML can't be parsed, the agent pushes a visible configmap-parse-error-<ns>-<name> service under the owning cluster with status UNKNOWN and the parse error in the message. This makes bad GitOps input loud, not silent.