Agent Deploy

One agent per Kubernetes cluster. Discovers services, reports status, runs HTTP probes, self-reports as a service so the admin UI always knows it's alive.

Install

helm repo add itops https://charts.mlops.hu
helm repo update
helm install itops-agent itops/itops-agent \
  --set node.id="myorg/myplatform/prod/cluster1" \
  --set itops.url="https://api.yourdomain.com" \
  --set itops.apiKey.value="<OPERATOR_API_KEY>" \
  -n itops --create-namespace

Required values

Value	Example	Notes
`node.id`	`myorg/myplatform/prod/cluster1`	4-level path prefix for every service this agent reports. First 4 segments of the service `path`.
`itops.url`	`https://api.yourdomain.com`	ITOps backend base URL. Internal: `http://itops-core.itops:8080`.
`itops.apiKey.value`	`<key>`	Same as platform chart's `secretEnv.ITOPS_SECURITY_OPERATOR_API_KEY`. Use `itops.apiKey.existingSecret` in production.

Watch configuration

watch:
  namespaces:
    - default
    - my-app
    - backend
  excludeNamespaces:
    - kube-system
    - kube-public
  labelPrefix: "itops.io"
  workloadNamePrefixes:
    - "my-release-"

The agent lists ConfigMaps matching <labelPrefix>/config: "true" in every watched namespace every 30 seconds. workloadNamePrefixes is a list of Helm release prefixes to try when resolving a service's K8s workload by name.

SLA groups (defined by the agent)

Group definitions (name + display name + tier + uptime target) are declared once per agent. Services reference them by name via their own slaGroup field.

slaGroups:
  - name: "payment-system"
    displayName: "Payment System"
    tier: "critical"
    targets:
      uptime: 99.99
      responseTime: 5
      resolutionTime: 60
  - name: "web-frontend"
    displayName: "Web Frontend"
    tier: "high"

Available tiers: critical, high, medium, low. The backend de-duplicates group rows by name, so two agents in two clusters with the same slaGroup join the same group.

Feature toggles

features:
  autoRegister: true            # create services from ConfigMaps
  autoCreateIncidents: false    # open an incident when a service goes DOWN
  parseItopsYaml: true          # turn off to disable ConfigMap-based discovery
  leaderElection: true          # safe to run multiple replicas

Self-report

Every 60 seconds the agent pushes a health report about itself as a service named <cluster>-agent under its own node.id. Status:

OPERATIONAL — discovered services, pushed status, no errors accumulated.
DEGRADED — partial failures (RBAC denied on some namespaces, ConfigMap parse errors, etc.). The response message lists the specific failures.
DOWN — the agent can't reach the K8s API or cannot do its job at all.

The Admin → Agents tab lists every connected cluster with heartbeat freshness, service counts, and degraded-state messages.

Common mistakes

Mistake	Symptom	Fix
Label `itops.io/managed: "true"`	Agent ignores ConfigMap	Use `itops.io/config: "true"`
Data key `itops.yaml`	Agent ignores ConfigMap	Must be `it-ops.yaml` (with hyphen)
Missing `path`	Parse error visible as `configmap-parse-error-...` in the UI	Add a single `path:` line
Workload name differs from path's last segment	Service stays `UNKNOWN`	Set `workload.name` in the ConfigMap
Custom `watch.labelPrefix` mismatch	Agent doesn't see the ConfigMap	ConfigMap label must match the prefix
Forgot to pass API key	Agent crashes on startup with `no API key configured`	Set `itops.apiKey.value` or `itops.apiKey.existingSecret`

Full values reference

helm show values itops/itops-agent

Every value is documented inline with defaults and examples.