Agent Reconciliation Loop
How agents observe state, reason about changes, and execute reconciliation plans.
Agent Reconciliation Loop
Overview
Reconciliation is the core loop that drives infrastructure management. Each agent continuously compares the desired state (what should exist) with the actual state (what does exist) and takes action to close any gaps.
The Five Phases
Phase 1: Observe
The agent reads both desired and actual state from etcd:
- Desired state: Set by user actions (CLI, API, IaC) and stored under
/desired// - Actual state: Reported by provider polling and stored under
/actual//
Desired: Instance "web-01" with type=cx31, image=ubuntu-22.04
Actual: Instance "web-01" does not exist
Delta: Instance "web-01" needs to be created
Phase 2: Reason
The agent sends the state delta to the LLM along with its available tools. The LLM generates an execution plan:
Reasoning: Instance "web-01" exists in desired state but not in actual state.
Need to create it using the provision_instance tool with the
specified type and image.
Plan:
1. provision_instance(name="web-01", type="cx31", image="ubuntu-22.04")
2. wait_for_ready(instance="web-01", timeout=300)
3. update_actual_state(instance="web-01", state="running")
Phase 3: Classify Risk
Each step in the plan is classified by risk level:
| Risk Level | Criteria | Action |
|---|---|---|
| Safe | Read-only, non-destructive | Auto-execute |
| Moderate | Creates or modifies resources | Auto-execute + notify |
| Dangerous | Deletes or significantly alters resources | Require human approval |
Phase 4: Execute
If the plan is approved (automatically for Safe/Moderate, manually for Dangerous), tools are executed sequentially:
[1/3] provision_instance(name="web-01", type="cx31") ... success (inst-abc123)
[2/3] wait_for_ready(instance="inst-abc123", timeout=300) ... success (running)
[3/3] update_actual_state(instance="inst-abc123") ... success
If a step fails, the agent can retry, skip, or abort based on the error type and configured policy.
Phase 5: Record
Every decision and action is recorded in the audit log:
{
"agent": "instance-agent",
"action": "reconcile",
"resource": "web-01",
"reasoning": "Instance exists in desired state but not actual state",
"plan_steps": 3,
"risk_level": "moderate",
"outcome": "success",
"duration_ms": 45000,
"timestamp": "2025-01-15T10:30:00Z"
}
Continuous Reconciliation
Agents do not run on a fixed schedule. Instead, they react to state changes via etcd watchers:
- User creates an instance via CLI
- CLI writes desired state to etcd
- etcd watch fires, notifying the instance agent
- Agent runs the reconciliation loop
- Actual state is updated when provisioning completes
This event-driven approach minimizes latency and avoids unnecessary polling.