All services
§ 06 / Operate

Run & Maintain

Ongoing operation of the systems we ship — on-call, eval drift, retraining cadence, regulatory posture.

AI systems decay differently than software. Data shifts, providers change, evals quietly stop reflecting reality. Run-and-Maintain is the discipline of treating that decay as a known maintenance schedule rather than a surprise.

§ 01

Why ongoing operation is its own contract

A launch team can ship a great system. Keeping it great is a different skillset, a different cadence, and a different incentive structure. We take the on-call pager, run the monthly eval review, and own the retraining calendar — so the team that built it gets to build the next thing.

§ 02

What we own when we run a system

01

On-call

24×7 pager rotation with SLOs you set, and a runbook we maintain.

02

Eval drift

Weekly automated eval runs. Drift alerts. Quarterly benchmark refresh.

03

Retraining cadence

Triggered by drift, by new data volume, or by calendar — whichever fires first.

04

Cost governance

Per-customer cost attribution, monthly review with finance, alerting on anomalies.

05

Incident response

Postmortems within 72 hours. Action items tracked to closure. No theatre.

06

Regulatory posture

SOC 2, GDPR, HIPAA, EU AI Act — whichever framework applies, we keep current.

§ 03

How a Run engagement works

  1. 01

    Onboarding

    Read every runbook, sit through one full incident, shadow the team for two weeks.

  2. 02

    Stabilize

    Codify undocumented practice. Fix the obvious sharp edges. Set SLOs explicitly.

  3. 03

    Operate

    Daily ops. Weekly eval review. Monthly business review with stakeholders.

  4. 04

    Improve

    Quarterly retraining. Annual architecture review. Continuous cost trimming.

§ 04

What we measure ourselves on

<30 min
P1 MTTR target
0
Eval regressions reaching production
Q
Retraining cadence, by default

Ready to look at this
in your context?

Start a conversation