# ITOps — REST API Reference for AI Assistants

All endpoints auth via `X-API-Key: $OPERATOR_API_KEY` unless noted. The key comes from
the platform chart's `secretEnv.ITOPS_SECURITY_OPERATOR_API_KEY`.

Every push endpoint (`/health/report`, `/storage/report`, `/backup/report`) accepts the
single `path` shortcut — the handler splits it into `(nodeId, service)` internally.

## Public health

```
GET /health          # Liveness — always 200 when the process is up
GET /ready           # Readiness — 200 only when DB is reachable
```

## Push webhooks (auth: X-API-Key)

### POST /api/v1/health/report

Auto-creates the service on first push. Subsequent pushes update status.

```json
{
  "path": "myorg/infra/prod/baremetal/galera-node1",
  "status": "OPERATIONAL",
  "message": "wsrep_cluster_size=3",
  "criticality": "critical",
  "slaGroup": "database-cluster",
  "serviceType": "database",
  "tags": ["database","baremetal"]
}
```

Status values: `OPERATIONAL`, `DEGRADED`, `DOWN`, `MAINTENANCE`, `UNKNOWN`. Unknown
status is stored as `UNKNOWN` + warning in response.

### POST /api/v1/storage/report

Auto-appends `storage` tag so the service appears on the Storage tab.

```json
{
  "path": "myorg/platform/prod/cluster1/postgresql",
  "allocatedBytes": 107374182400,
  "usedBytes": 53687091200,
  "storageType": "pvc",
  "mountPath": "/var/lib/postgresql"
}
```

Response:

```json
{
  "success": true,
  "serviceName": "postgresql",
  "freePercent": 50,
  "status": "healthy",
  "warnings": []
}
```

Status levels: `healthy` (>30% free), `warning` (10–30%), `critical` (<10%).

### POST /api/v1/backup/report

Three addressing modes:

```json
# Service-level (most common)
{ "path": "myorg/platform/prod/cluster1/postgresql",
  "status": "success", "sizeBytes": 5242880 }

# SLA-group level — propagates to every member with backup.expected=true
{ "slaGroup": "payment-system", "status": "success" }

# Namespace level
{ "namespace": "production", "status": "success" }
```

Status values: `success`, `failed`, `partial`. Other values stored as-is + warning.

## Loud-over-silent error handling

Pushes with missing/bad fields are NOT rejected (except missing `service`/`path`
entirely). Instead:

- Missing / <5-segment `path` → padded to `unknown/unknown/unknown/unknown/<service>`,
  service lands under a red "unknown" branch in the UI.
- Negative `allocatedBytes` / `usedBytes` / `sizeBytes` → clamped to 0.
- `freePercent` out of [0,100] → clamped.
- Unknown `status` on health → stored as `UNKNOWN`.
- Every normalization returns a `warnings[]` array in the response for self-diagnosis.

## Agent internal (agent ↔ core only — not for external clients)

```
POST /api/v1/operator/register   # Register service discovered from it-ops.yaml
POST /api/v1/operator/status     # 30s batch status sync with authoritative reconcile
POST /api/v1/operator/heartbeat  # 10s heartbeat, returns pending commands
POST /api/v1/operator/command-result  # Command execution result
```

Called only by `itops-agent` — uses the full ServiceRegistration shape (nodeId +
services[]) for wire-protocol stability.

## Maintenance windows (auth: JWT admin)

```
POST /api/v1/sla/exclusion-window/start
{ "path": "myorg/platform/prod/cluster1/postgresql",
  "reason": "scheduled patching",
  "expectedEndAt": "2026-05-01T02:00:00Z" }

POST /api/v1/sla/exclusion-window/stop
{ "path": "myorg/platform/prod/cluster1/postgresql" }
```

Pauses SLA calculation for the service between start/stop timestamps.

## GraphQL

```
POST /graphql                    # All read queries + mutations
GET  /graphql/ws                 # WebSocket subscriptions
```

Auth: `Authorization: Bearer <JWT>`. Introspect the schema for the full catalog —
there are ~80 queries/mutations covering services, SLA groups, tickets, workflows,
dashboards, and the new `agents` query for the Admin → Agents tab.

## Templates & license

```
GET  /api/v1/templates/export    # YAML export of workflows + catalog + SLA defs
POST /api/v1/templates/import    # YAML import, dry-run flag supported
POST /api/v1/license/activate    # Upload Ed25519 JWT license
```

## Idempotency

All push endpoints are idempotent on `(path, timestamp)` — replays don't double-count.
Agent's sync loop uses an idempotency key header (`X-Idempotency-Key`) to safely retry
buffered requests after reconnect.
