GitOps Ticketing Setup

TICKETING — PAID PLUGIN

How to declare ticketing workflows and service catalog items as code. Workflow state machines and catalog items are declared only in values.yaml — the running app has no admin editor. The Helm chart is the source of truth.

What GitOps Ticketing setup means

The ticketing plugin is fully declarative. Workflow state machines (which transitions are legal, who is allowed to perform them) and service catalog items (the pre-filled cards on the “New ticket” page) are declared in values.yaml alongside the rest of your platform config. There is no admin form to click your way through. Every change to ticketing behaviour is a Git commit.

Why no admin UI

Operational policy — who handles what, what counts as an incident, what the response time should be — is infrastructure, not user data. Putting it in YAML makes it auditable (full Git history with author + diff), reviewable (it goes through pull requests like every other config change), and reproducible (GitOps tools like ArgoCD or Flux apply it the same way to every environment). A UI editor would force one of two bad choices: either the UI rewrites the YAML behind the scenes (added complexity, a new failure mode), or the UI state and the YAML state diverge (configuration drift, the bug nobody can reproduce). We picked YAML-only.

End-to-end flow

1. edit  helmcharts/dev/charts/itops/values.yaml
2. helm upgrade  itops-dev . -n itops-dev
3. ConfigMap projected to mounted files within 30-60s
4. POST /api/v1/dev/registry/reload  (atomic in-memory swap, no pod restart)
   -- or -- pod restart automatically picks up the new state
5. Operator sees the new workflow / catalog item in the UI

Workflow YAML schema

Reference shape for a single workflow:

FieldRequiredPurpose
nameyesStable identifier. Referenced by catalog items and tickets.
displayNameyesLabel shown in the UI.
transitions[]yesList of allowed state transitions.
transitions[].fromyesSource status. One of OPEN, IN_PROGRESS, RESOLVED, CLOSED.
transitions[].toyesTarget status. Same enum.
transitions[].roleyesWho may perform it: assignee, requester, or any.

Status enum is fixed: OPEN, IN_PROGRESS, RESOLVED, CLOSED — exactly four declared values, you cannot add more.

Example: Incident Response workflow

workflows:
  - name: incident_response
    displayName: "Incident Response"
    transitions:
      - {from: OPEN,        to: IN_PROGRESS, role: assignee}
      - {from: IN_PROGRESS, to: RESOLVED,    role: assignee}
      - {from: RESOLVED,    to: CLOSED,      role: requester}
      - {from: RESOLVED,    to: IN_PROGRESS, role: any}

Example: Change Request workflow

Two-phase variant where the implementer signs off:

- name: change_request
  displayName: "Change Request"
  transitions:
    - {from: OPEN,        to: IN_PROGRESS, role: assignee}
    - {from: IN_PROGRESS, to: RESOLVED,    role: assignee}
    - {from: RESOLVED,    to: CLOSED,      role: assignee}   # implementer signs off
    - {from: RESOLVED,    to: IN_PROGRESS, role: any}

In change_request the assignee closes the ticket because they verify their own change. In incident_response the requester (the person who opened it) closes, because they observe whether the fix actually worked in their environment.

The 5 default workflows

WorkflowTypical caseSpecial trait
incident_responseOperational incidentRequester confirms close
service_requestGeneral requestEither side may close
change_requestChange managementAssignee verifies
problem_investigationRoot cause analysisIN_PROGRESSOPEN backtrack allowed
access_requestAccess provisioningRequester confirms close

The full transition tables for each are in Default Workflows.

Catalog item YAML schema

FieldRequiredPurpose
nameyesStable identifier. Foreign key on tickets.
categoryyesMust match a declared category name.
displayNameyesCard title.
descriptionnoCard subtitle.
iconnoMaterial Symbols name.
workflowyesMust match a declared workflow.
defaults.prioritynoLOW | MEDIUM | HIGH | CRITICAL
defaults.assigneeGroupnoGroup that auto-receives the ticket.
sla.responseMinutesnoSLA target for first response.
sla.resolutionMinutesnoSLA target for resolution.

Example: catalog item

catalog:
  categories:
    - {name: incidents, displayName: "Incidents", icon: report_problem}
  items:
    - name: outage_report
      category: incidents
      displayName: "Service Outage"
      description: "Production service is degraded or unreachable."
      icon: cloud_off
      workflow: incident_response
      defaults: {priority: CRITICAL}
      sla: {responseMinutes: 15, resolutionMinutes: 240}

Splitting across multiple files

The backend reads every *.yaml file under /etc/itops/ticketing.d/ in lexicographic order and merges them. The Helm chart generates one ConfigMap key per workflow (workflow-NAME.yaml), so a pull request that touches one workflow does not collide with a pull request that touches another — the diff stays scoped to the file the reviewer cares about.

Validation

The backend validates the registry on startup and on every reload:

Promote dev → demo → prod

Same chart, different values files:

Validate workflow changes on dev (e2e test + manual smoke test on dev-demo.mlops.hu), then merge the values change to the prod values file.

Reload mechanism

POST /api/v1/dev/registry/reload atomically swaps the in-memory registry for the freshly loaded YAML content. The endpoint is only compiled into dev builds — in production builds it is not registered at all, and the only way state changes is a pod restart. This keeps the production attack surface minimal.

Audit trail

Two complementary sources cover the full audit story:

Together they answer both “why did this ticket move?” and “why does the workflow allow that move?”.

Hierarchical naming

Workflows, catalog items and groups all share the same naming convention — a path of the shape org/platform/env/cluster/service/<short-name>. For the 90% case you declare with a short name and the backend implicitly prepends the configured org; when an env- or service-level override is needed, write a slash inside the name and the backend treats it as a path-override (your literal path replaces the auto-prefixed one). The same shortcut keeps simple values files small while letting bigger deployments scope a workflow, group or catalog item to a single environment, cluster or service.

ticketing:
  org: mlops-app
  groups:
    - name: db-admins                        # → mlops-app/db-admins
    - name: itops-dev/prod/db-admins         # → mlops-app/itops-dev/prod/db-admins
    - name: itops-dev/dev/c1/postgres-prod/oncall    # most specific scope
  workflows:
    - name: incident_response                # → mlops-app/incident_response
    - name: itops-dev/prod/change_strict     # only lives in prod
  catalog:
    items:
      - name: db-restart                     # global
      - name: itops-dev/dev/c1/postgres-prod/db-emergency-restart   # service-scoped

Cascade lookup + group membership inheritance

When a workflow transition references role: db-admins, the backend resolves the role against the ticket’s service path by cascading upward from the most-specific scope to the org root, stopping at the first match. The most-specific group definition wins; less-specific scopes are only consulted as fallback.

Ticket service: mlops-app/itops-dev/dev/c1/postgres-prod
Workflow role:  db-admins

Cascade lookup order (most-specific → org fallback):
  1. mlops-app/itops-dev/dev/c1/postgres-prod/db-admins
  2. mlops-app/itops-dev/dev/c1/db-admins
  3. mlops-app/itops-dev/dev/db-admins                  ← match (user member here)
  4. mlops-app/itops-dev/db-admins                      (skipped — earlier match wins)
  5. mlops-app/db-admins

Membership inheritance. A user that is a member of mlops-app/itops-dev/dev/db-admins is automatically a member of every group below it — cluster-level, service-level, anywhere under that scope. DRY: declare the DBA team once at the env scope and they are authorised on every DB service underneath without restating the roster per service.

Superadmin override. A user with users.is_superadmin = true is treated as an implicit member of every group at every scope — the standard ITSM break-glass convention so on-call engineers can always step into any role during an incident.

See also