Agent Deploy
One agent per Kubernetes cluster. Discovers services, reports status, runs HTTP probes, self-reports as a service so the admin UI always knows it's alive.
Install
helm repo add itops https://charts.mlops.hu
helm repo update
helm install itops-agent itops/itops-agent \
--set node.id="myorg/myplatform/prod/cluster1" \
--set itops.url="https://api.yourdomain.com" \
--set itops.apiKey.value="<OPERATOR_API_KEY>" \
-n itops --create-namespace
Required values
| Value | Example | Notes |
|---|---|---|
node.id | myorg/myplatform/prod/cluster1 | 4-level path prefix for every service this agent reports. First 4 segments of the service path. |
itops.url | https://api.yourdomain.com | ITOps backend base URL. Internal: http://itops-core.itops:8080. |
itops.apiKey.value | <key> | Same as platform chart's secretEnv.ITOPS_SECURITY_OPERATOR_API_KEY. Use itops.apiKey.existingSecret in production. |
Watch configuration
watch:
namespaces:
- default
- my-app
- backend
excludeNamespaces:
- kube-system
- kube-public
labelPrefix: "itops.io"
workloadNamePrefixes:
- "my-release-"
The agent lists ConfigMaps matching <labelPrefix>/config: "true" in every watched namespace every 30 seconds. workloadNamePrefixes is a list of Helm release prefixes to try when resolving a service's K8s workload by name.
SLA groups (defined by the agent)
Group definitions (name + display name + tier + uptime target) are declared once per agent. Services reference them by name via their own slaGroup field.
slaGroups:
- name: "payment-system"
displayName: "Payment System"
tier: "critical"
targets:
uptime: 99.99
responseTime: 5
resolutionTime: 60
- name: "web-frontend"
displayName: "Web Frontend"
tier: "high"
Available tiers: critical, high, medium, low. The backend de-duplicates group rows by name, so two agents in two clusters with the same slaGroup join the same group.
Feature toggles
features:
autoRegister: true # create services from ConfigMaps
autoCreateIncidents: false # open an incident when a service goes DOWN
parseItopsYaml: true # turn off to disable ConfigMap-based discovery
leaderElection: true # safe to run multiple replicas
Self-report
Every 60 seconds the agent pushes a health report about itself as a service named <cluster>-agent under its own node.id. Status:
- OPERATIONAL — discovered services, pushed status, no errors accumulated.
- DEGRADED — partial failures (RBAC denied on some namespaces, ConfigMap parse errors, etc.). The response message lists the specific failures.
- DOWN — the agent can't reach the K8s API or cannot do its job at all.
The Admin → Agents tab lists every connected cluster with heartbeat freshness, service counts, and degraded-state messages.
Common mistakes
| Mistake | Symptom | Fix |
|---|---|---|
Label itops.io/managed: "true" | Agent ignores ConfigMap | Use itops.io/config: "true" |
Data key itops.yaml | Agent ignores ConfigMap | Must be it-ops.yaml (with hyphen) |
Missing path | Parse error visible as configmap-parse-error-... in the UI | Add a single path: line |
| Workload name differs from path's last segment | Service stays UNKNOWN | Set workload.name in the ConfigMap |
Custom watch.labelPrefix mismatch | Agent doesn't see the ConfigMap | ConfigMap label must match the prefix |
| Forgot to pass API key | Agent crashes on startup with no API key configured | Set itops.apiKey.value or itops.apiKey.existingSecret |
Full values reference
helm show values itops/itops-agent
Every value is documented inline with defaults and examples.