v4.0 — Kubernetes-native Operations

IT Operations,
Fully Automated

Service discovery, SLA monitoring, incident management — all driven from your Helm chart. Deploy in 1 hour, not 6 months.

5 min
SLA Resolution
<30s
Incident Detection
20+
Agents Supported
0
Manual Config

The Gap Between Monitoring and Operations

Prometheus tells you what happened. ITOps tells you what to do about it.

Without ITOps
SLA tracking in Excel
Backup status: "probably runs"
Incident response: Slack thread
Service catalog: Confluence page
Management report: 2 days of work
With ITOps
Real-time SLA from agent data
Backup webhook + auto-alert
Auto-ticket on service DOWN
Agent discovers from ConfigMap
Dashboard, one click

What You Get

Everything a platform team needs — integrated, not stitched together.

Auto Service Discovery

K8s agent watches ConfigMaps with itops.io/config label. Services appear automatically — zero manual entry.

SLA Monitoring

5-minute snapshots from every agent. Async aggregation with 15-min delay. Daily + monthly uptime% per service and group.

Auto Incident + Ticket

Service goes DOWN → SLA incident created → INCIDENT ticket generated → dashboard updates. Under 30 seconds.

💾

Backup Monitoring

Universal webhook: any backup tool calls POST /api/v1/backup/report. Alerts if backup older than maxAgeDays.

📄

Ticketing + Workflows

Full ticket lifecycle with visual workflow builder. 17 step types, versioning, templates. SLA timers on every ticket.

🔒

RBAC + Field Permissions

Role-based access with field-level visibility control. LDAP, Azure AD, Okta, Google SSO. Audit trail on everything.

How it Works

Three steps from zero to full ops management.

1

Deploy Agent

Install the ITOps agent on each K8s cluster via Helm.

helm repo add itops https://charts.mlops.hu helm install itops-agent itops/itops-agent \ --set node.id="org/prod/cluster1" \ -n itops
2

Configure Services

Add itops: block to your service's values.yaml. GitOps handles the rest.

itops: criticality: "critical" slaGroup: "payment-system" backup: expected: true maxAgeDays: 1
3

Monitor + React

Dashboard shows real-time status. SLA measured. Incidents auto-generated. Tickets auto-created.

Agent sync (30s) → sla_snapshots → Aggregator (15min delay) → Uptime % calculated → Dashboard updated

Architecture

Multi-agent, multi-cluster, single pane of glass.

K8s Cluster 1 K8s Cluster 2 K8s Cluster N ITOps Agent ITOps Agent ITOps Agent ConfigMap Watch ConfigMap Watch ConfigMap Watch Pod Monitor Pod Monitor Pod Monitor | | | +----------+------------+-----------+-----------+ | | POST /api/v1/operator/status (every 30s) | +--------v---------+ +------------------+ | ITOps Core | | ITOps UI | | Go Backend | | Vue.js + Quasar | | GraphQL + REST | | Pinia Stores | | SLA Aggregator | | Real-time Sync | +--------+---------+ +------------------+ | +--------v---------+ | PostgreSQL | Redis (cache + sessions) | sla_snapshots | | services | | tickets | +------------------+

Pricing

Self-hosted. Your data stays yours.

Starter

Free

1 cluster, 5 services

  • Service discovery (agent)
  • Operations catalog
  • RBAC + 2 users
  • Community support
Get Started

Enterprise

Custom

Unlimited everything

  • Everything in Team
  • Unlimited clusters + users
  • Audit log + compliance
  • On-prem deployment support
  • Dedicated support + SLA
  • Custom integrations
Contact Sales

Ready to automate your operations?

Deploy in under 1 hour. No credit card required.

Read the Docs → Try Live Demo