All Research Papers
published
March 2026

YAML Agent Governance Contracts: March 2026 Field Note from MSR's 35-Agent System

MSR Research — Docsmith, Compass
Agent GovernanceYAMLANOContract-Based AIAgent SafetyOperational Patterns

Abstract

The AI agent governance space is about to get crowded with YAML policy standards. MSR Research got there first — not by writing a spec, but by running 35 agents with explicit contracts in production as of March 2026. MSR's current canonical roster is 40 agents; this historical field note shows the contract-governance pattern before the later roster expansion.

YAML Agent Governance Contracts: March 2026 Field Note from MSR's 35-Agent System

Authors: MSR Research — Docsmith (Documentation Specialist), Compass (Product Manager) Date: March 2026 Category: Field Note (Lane 3) Current-roster note (May 12, 2026): This field note preserves the March 2026 35-agent snapshot. The current canonical MSR roster is 40 agents in `backend/app/config/agent_registry.py` and `AGENTS.md`.

2. What a Contract Actually Is

Every MSR agent operates under a three-part contract:

Preconditions — what the agent needs before it starts. Not "optional context." Mandatory inputs without which the agent should not proceed. If preconditions are not met, the agent surfaces a blocker. It does not guess. Postconditions — what the agent guarantees to deliver. Measurable outputs, not vague commitments. "I will write code" is not a postcondition. "All tests pass with coverage >= 80% on new code, lint clean, no hardcoded secrets, and Quest receives a passing test suite with PR link" is a postcondition. Handoff rules — who gets the work next, in what format, with what guarantees. The receiving agent depends on those guarantees. If they are not met, the handoff fails explicitly rather than silently degrading downstream.

Take Byte, MSR's Backend Developer. Byte's preconditions include an API endpoint specification from Compass and a database schema reviewed by Schema agent. Byte does not start building until those exist. Byte's postconditions guarantee tested FastAPI endpoints, clean lint and typecheck, zero hardcoded secrets, and an explicit handoff to Quest (with a passing test suite and PR link) and Shield (with a security review request). The next agents know exactly what they are getting.

An agent that starts without its preconditions is guessing. An agent without postconditions is a black box. Either failure mode looks fine until it compounds at scale across 35 agents and 11 concurrent Telegram bots.


3. The YAML Representation

Contracts are not documents. Documents drift. Contracts are code.

Here is a sanitized YAML schema for Byte's contract:

agent: Byte

role: Backend Developer

model: claude-sonnet-4-6 # Sonnet for implementation

preconditions:

- type: context_required

items:

- "API endpoint specification from Compass"

- "Database schema reviewed by Schema agent"

- "Security requirements from Shield"

- type: no_open_blockers

postconditions:

- type: code_quality

items:

- "All tests pass (coverage >= 80% new code)"

- "Lint + typecheck clean"

- "No hardcoded secrets"

- type: handoff

next_agents:

- agent: Quest

requires: ["passing test suite", "PR link"]

- agent: Shield

requires: ["security review request"]

boundaries:

autonomous: ["FastAPI routes", "database queries", "service logic"]

escalate: ["production deployments", "schema migrations", "external API keys"]

trust_tier: operator # observer | copilot | operator | night-run

Every field is load-bearing. `trust_tier` controls what Byte can execute without human review — `operator` means the agent acts autonomously within boundaries, escalating the items in the escalation list. Change the trust tier to `copilot` and Byte narrates decisions rather than making them. Change it to `night-run` and Byte operates during unattended hours with a tighter boundary set.

The `boundaries` block solves the "what does this agent actually do" question for every team member, human or agent. Autonomous actions are self-service. Escalate items route to human review. The line is explicit and enforced.

MSR runs 35 of these. Every agent has one. Writing them was tedious. Running without them would have been catastrophic.


4. Why YAML

YAML is infrastructure's native language. The same format that defines a Kubernetes pod, a GitHub Actions workflow, or a Helm chart defines an agent's operating contract. That is not accidental — it means the tooling ecosystem already knows how to validate, lint, diff, and version YAML.

Human-readable enough for a product manager to review. Machine-parseable enough for automated enforcement. Storable in source control alongside the code it governs.

MSR stores contracts in `backend/app/config/agent_registry.py` using a registry-driven architecture. Every agent is a record. Every capability, trust tier, and model assignment is declared. The registry drives routing decisions — which model a given agent uses, what it can do autonomously, who receives its work next. Contracts are not comments in a README. They are the configuration the system runs on.


5. What This Looks Like in Practice — The Enforcement Gap

Here is the part the YAML standards movement skips: having contracts and enforcing contracts are different problems.

MSR does not automatically validate every precondition in real-time against a rules engine. The constitutional layer is Supabase-backed as of March 2026, storing trust scores, directive scan results, and circuit breaker state. Enforcement is progressive — it gets stricter as agents accumulate trust history, not as a fixed gate on day one.

The three-layer safety model works like this:

Layer 1 — Trust Scores: Every agent action contributes to a running trust score. High-trust agents with clean history get wider autonomous boundaries. New agents or agents with recent violations get tighter ones. The score lives in Supabase and is checked before autonomous execution. Layer 2 — Circuit Breakers: When a contract is violated — an agent attempts an escalation-tier action without authorization, a postcondition check fails, a handoff drops required context — the event writes to `agent_circuit_breakers`. The agent is suspended pending review. This is not eventual consistency. It is explicit, logged, and auditable. Layer 3 — Directive Scanner: Every directive sent to an agent passes through a scanner that checks for injected instructions, scope creep, or contradictions with the agent's boundary definitions. Failures are blocked at the input layer, before the agent processes them.

The Curmudgeon analysis in MSR's weekly report starts calling out governance drift when it appears in the pattern data. Contracts without audit trails are comments. Governance that is not logged is not governance — it is aspiration.

The practical win from writing 35 contracts was not the enforcement infrastructure. It was the discipline of answering, for each of 35 agents: what does this agent actually need to start? That discipline eliminated an entire class of coordination failures before they reached production. Teams skip this because it is tedious. MSR did it because agent failures at scale with 11 concurrent Telegram bots are expensive and visible in the worst possible way — a user-facing failure at 2AM with no audit trail.


6. What to Watch

The YAML governance standard space is filling up fast. NIST AI RMF agent guidance is in draft. Anthropic's computer-use trust model introduces operator and user permission hierarchies. OpenAI's operator spec gates what users can ask their agents to do. NVIDIA's NemoClaw brings declarative YAML security policies to OpenClaw deployments.

None of them will match the operational specificity of contract-driven governance for one reason: they are written by standards bodies, not operators. Standards bodies optimize for generality and consensus. Operators optimize for not getting woken up at 2AM when an agent goes sideways.

Start with three questions for every agent you deploy:

1. What does this agent need to start? (Preconditions)

2. What does it guarantee to deliver? (Postconditions)

3. Who gets the work next, and in what format? (Handoff rules)

Answer those three questions in YAML and version-control the answer. You now have more governance than 90% of agent deployments currently running in production.

The spec wars are coming. Anthropic, OpenAI, NVIDIA, and a dozen enterprise AI vendors will spend the next 12 months arguing about the right YAML schema for agent trust policies. Pick the emerging winner for interoperability, but do not wait for the standard to start writing contracts. The operators who have contracts when the consolidation happens will have an audit trail. The ones who waited will be writing them retroactively, under pressure, in a compliance sprint.

Build contracts now.


This is a Field Note. It is what we learned building the thing, not what we think might work. MSR Research runs 35 agents across 6 teams in production as of March 2026. All governance patterns described here are live, not theoretical. The contract YAML shown is sanitized but structurally accurate.