Agent Deploy

One agent per Kubernetes cluster. Discovers services, reports status, runs HTTP probes, self-reports as a service so the admin UI always knows it's alive.

Install

helm repo add itops https://charts.mlops.hu
helm repo update
helm install itops-agent itops/itops-agent \
  --set node.id="myorg/myplatform/prod/cluster1" \
  --set itops.url="https://api.yourdomain.com" \
  --set itops.apiKey.value="<OPERATOR_API_KEY>" \
  -n itops --create-namespace

Required values

ValueExampleNotes
node.idmyorg/myplatform/prod/cluster14-level path prefix for every service this agent reports. First 4 segments of the service path.
itops.urlhttps://api.yourdomain.comITOps backend base URL. Internal: http://itops-core.itops:8080.
itops.apiKey.value<key>Same as platform chart's secretEnv.ITOPS_SECURITY_OPERATOR_API_KEY. Use itops.apiKey.existingSecret in production.

Watch configuration

watch:
  namespaces:
    - default
    - my-app
    - backend
  excludeNamespaces:
    - kube-system
    - kube-public
  labelPrefix: "itops.io"
  workloadNamePrefixes:
    - "my-release-"

The agent lists ConfigMaps matching <labelPrefix>/config: "true" in every watched namespace every 30 seconds. workloadNamePrefixes is a list of Helm release prefixes to try when resolving a service's K8s workload by name.

SLA groups (defined by the agent)

Group definitions (name + display name + tier + uptime target) are declared once per agent. Services reference them by name via their own slaGroup field.

slaGroups:
  - name: "payment-system"
    displayName: "Payment System"
    tier: "critical"
    targets:
      uptime: 99.99
      responseTime: 5
      resolutionTime: 60
  - name: "web-frontend"
    displayName: "Web Frontend"
    tier: "high"

Available tiers: critical, high, medium, low. The backend de-duplicates group rows by name, so two agents in two clusters with the same slaGroup join the same group.

Feature toggles

features:
  autoRegister: true            # create services from ConfigMaps
  autoCreateIncidents: false    # open an incident when a service goes DOWN
  parseItopsYaml: true          # turn off to disable ConfigMap-based discovery
  leaderElection: true          # safe to run multiple replicas

Self-report

Every 60 seconds the agent pushes a health report about itself as a service named <cluster>-agent under its own node.id. Status:

The Admin → Agents tab lists every connected cluster with heartbeat freshness, service counts, and degraded-state messages.

Common mistakes

MistakeSymptomFix
Label itops.io/managed: "true"Agent ignores ConfigMapUse itops.io/config: "true"
Data key itops.yamlAgent ignores ConfigMapMust be it-ops.yaml (with hyphen)
Missing pathParse error visible as configmap-parse-error-... in the UIAdd a single path: line
Workload name differs from path's last segmentService stays UNKNOWNSet workload.name in the ConfigMap
Custom watch.labelPrefix mismatchAgent doesn't see the ConfigMapConfigMap label must match the prefix
Forgot to pass API keyAgent crashes on startup with no API key configuredSet itops.apiKey.value or itops.apiKey.existingSecret

Full values reference

helm show values itops/itops-agent

Every value is documented inline with defaults and examples.