GitOps Schema — it-ops.yaml reference

One required field, everything else opt-in. The agent discovers ConfigMaps labelled itops.io/config: "true" and reads the it-ops.yaml data key.

Minimum

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-app-itops
  labels:
    itops.io/config: "true"
data:
  it-ops.yaml: |
    path: "myorg/myplatform/prod/cluster1/my-app"

One line. The 5-level path (organization/platform/environment/cluster/service) uniquely identifies the service across every cluster connected to the platform. First 4 segments become the hierarchy node, the 5th becomes the service name.

Progressive disclosure

Add fields to unlock features. Every entry below is optional.

Presentation

displayName: "My App"         # default: titlecased last path segment
description: "Payment API"
type: api                     # free-form category (api, database, cache, storage, ...)
tags: [api, payment]          # storage/database/cache/s3/ebs/rds tags route to Storage tab
team: backend                 # stored under ownership.team

SLA tracking

slaGroup: "payment-system"    # membership in a cross-cluster SLA group
criticality: critical         # critical | high | medium | low
                              # Auto-assigns a default SLA tier on first sight.

Workload override (K8s)

workload:
  type: statefulset           # deployment (default) | statefulset | daemonset
  name: my-release-my-app     # default: last path segment
                              # Set when Helm release prefix differs from service name.

Backup monitoring

backup:
  expected: true
  maxAgeDays: 1
  storageSize: "100Gi"        # informational
  schedule: "0 2 * * *"       # informational (display only)
  retention: "30d"

When expected: true, the service appears on the Backup tab and SLA alerts fire if no /api/v1/backup/report push arrives within maxAgeDays.

HTTP health probe

health:
  enabled: true
  port: 8080                  # default 80
  path: "/healthz"            # default /healthz
  interval: "30s"             # default 30s
  timeout: "5s"               # default 5s
  # Or absolute URL override:
  # endpoint: "http://some-external-host:9000/healthz"

The agent probes the URL in-cluster at interval. A 2xx response upgrades the service to OPERATIONAL (overrides K8s-native status). Non-2xx or timeout marks it DOWN with a named reason (HealthProbeFailed, HealthProbeTimeout). The service detail dialog shows an "active" badge with the probe target.

Dependencies

Every reference is a single path: — globally unique, clickable in the UI, never ambiguous.

dependencies:
  requires:
    - path: "myorg/myplatform/prod/cluster1/postgresql"
      type: storage            # optional classification
      critical: true           # default false
    - path: "myorg/myplatform/prod/cluster1/redis"
      type: cache
  usedBy:
    - path: "myorg/myplatform/prod/cluster1/web-frontend"
      type: frontend

requires and usedBy are symmetric — either side can declare the edge; the platform renders both chips on both services' detail pages.

Links

links:
  runbook: "https://wiki.corp/runbooks/my-app"
  dashboard: "https://grafana.corp/d/my-app"
  documentation: "https://docs.corp/my-app"
  repository: "https://github.com/mycorp/my-app"
  logs: "https://elastic.corp/app/my-app"
  alerts: "https://alertmanager.corp/#/my-app"
  api: "https://api.corp/my-app"
  custom: "https://whatever.corp/my-app"

Contacts

contacts:
  owner: "alice@example.com"
  slack: "#backend-oncall"
  contact: "+36-30-1234567"
  escalation: "cto@example.com"

Merged into the service's ownership block alongside team.

Relations (parent/children graph)

relations:
  parent: "retail-stack-umbrella"
  children:
    - sub-component-a
    - sub-component-b

Free-form logical hierarchy beyond the cluster/namespace tree. Rendered under "Relations" on the service detail.

Monitoring hints

monitoring:
  enabled: true
  prometheusJob: "my-app"
  alerts:
    - name: "HighErrorRate"
      severity: warning
      expression: 'rate(http_errors[5m]) > 0.01'

Informational — the platform doesn't evaluate rules, just surfaces them in the service detail.

Custom key/value bag

custom:
  costCenter: "CC-42"
  dataClass: "pii"
  complianceTier: "gdpr-restricted"

Free-form, passthrough, no validation. Rendered as a key/value table on the service detail. Use this for organization-specific annotations without schema changes.

Applying it

Commit the ConfigMap to Git next to the service's Helm chart. Apply via ArgoCD (recommended) or kubectl apply. The agent rediscovers every 30 seconds and pushes changes to the platform automatically.

Mixing kubectl apply with an ArgoCD-managed path causes drift. Pick one.

Parse errors

If the YAML can't be parsed, the agent pushes a visible configmap-parse-error-<ns>-<name> service under the owning cluster with status UNKNOWN and the parse error in the message. This makes bad GitOps input loud, not silent.