# ITOps — Platform Overview for AI Assistants

## What it is

Self-hosted IT operations platform. Kubernetes-native. Tracks service health, SLA
uptime, backups, storage. One dashboard for K8s-managed and bare-metal services alike.

## One-sentence architecture

A Go backend (`itops-core`) exposes GraphQL + REST, a Vue/Quasar SPA (`itops-ui`) is the
dashboard, a K8s operator (`itops-agent`) per cluster watches ConfigMaps labelled
`itops.io/config: "true"` and pushes discovery + status every 30s. PostgreSQL stores
everything.

## The single identifier: `path`

Every service is addressed by a 5-level path:

    organization/platform/environment/cluster/service

First 4 segments are the hierarchy node, 5th is the service name. This one string goes
into `it-ops.yaml` as `path:`, into REST push payloads as `"path":"..."`, into
dependency refs as `path: "..."`. No separate `nodeId` / `name` pair anywhere the user
writes YAML.

## Two ways data flows in

1. **K8s agent discovery** — the agent lists ConfigMaps with label
   `itops.io/config: "true"`, parses the `it-ops.yaml` data key, registers services
   via `POST /api/v1/operator/register`, then syncs status every 30s via
   `POST /api/v1/operator/status`.

2. **Push webhooks (bare-metal, external)** — any HTTP client calls:
   - `POST /api/v1/health/report` — first push auto-creates the service
   - `POST /api/v1/storage/report` — auto-appends `storage` tag, Storage tab picks up
   - `POST /api/v1/backup/report` — records backup completion timestamp

   All three accept a single `"path": "..."` field that the handler splits into
   `(nodeId, service)`. Missing/malformed fields are normalized to `"unknown"` sentinels
   so bad pushes land visibly instead of silently dropping.

## Schema: `it-ops.yaml`

One required field: `path`. Everything else is opt-in. The agent parser accepts every
flat top-level shortcut (`criticality`, `slaGroup`, `team`, `tags`, `type`, `backup`,
`health`, `dependencies`, `relations`, `custom`, `contacts`, `links`, `monitoring`,
`workload`) and merges them into the canonical shape before pushing to the backend.

See `10-gitops.md` for the full reference.

## What each field affects (short map)

- `path` → creates/finds the service + hierarchy node
- `slaGroup` → SLA tracking + group card
- `criticality` → default SLA tier (critical/high/medium/low → 99.99/99.95/99.9/99.5%)
- `backup.expected` → Backup tab entry + overdue alerts
- `health.enabled` → agent runs HTTP probe, overrides K8s-native status
- `dependencies.{requires,usedBy}` → clickable dep chips on service detail
- `tags` containing `storage|database|cache|s3|ebs|rds|elasticache` → Storage tab
- `custom` → free-form key/value table on service detail
- `relations` → parent/children graph edges on service detail

## Agent fleet view

Every running agent self-reports as a service (`<cluster>-agent`, tags
`[itops-agent, meta]`). The Admin → Agents tab lists all connected clusters with
heartbeat freshness, service counts, and degraded state when the agent can't do its
job (RBAC denied, ConfigMap parse failed, API unreachable).

## Loud over silent

Incoming pushes with missing/bad fields aren't rejected — they land under an
`unknown/unknown/unknown/unknown` hierarchy branch with a red badge, and the response
echoes a `warnings[]` array explaining what was normalized. A red "N misconfigured
pushes" chip in the Operations header surfaces the count.

## Where to look

- YAML schema: `10-gitops.md`
- REST push endpoints: `05-rest-api.md`
- Agent chart values: `helm show values itops/itops-agent`
- Platform chart values: `helm show values itops/itops`
- Live demo: <https://demo.mlops.hu> (login admin / Password123!)
- Public helm repo: <https://charts.mlops.hu>
