AI Agents

Overview

AgentMetal uses 13 specialized AI agents to manage infrastructure resources. Each agent is responsible for a specific domain (instances, databases, DNS, etc.) and uses Large Language Models for reasoning combined with a tool-use system for executing actions.

Agent Architecture

The system follows an agent-per-domain pattern where each agent encapsulates deep knowledge about its resource type. An Orchestrator agent coordinates multi-service deployments that span multiple domains.

Agent Roster

Agent	Domain	Description
Orchestrator	Multi-service	Coordinates complex deployments across agents
Instance Agent	Compute	Provisions and manages bare-metal/virtual instances
Database Agent	Data	Manages PostgreSQL and MySQL databases
VPC Agent	Networking	Manages VPCs, subnets, and security groups
LoadBalancer Agent	Traffic	Manages HTTP and TCP load balancers
DNS Agent	DNS	Manages zones and records
K3s Agent	Kubernetes	Manages K3s cluster lifecycle
Redis Agent	Caching	Manages Redis clusters
Queue Agent	Messaging	Manages RabbitMQ and NATS queues
Function Agent	Serverless	Manages serverless functions
Bucket Agent	Storage	Manages object storage buckets
IaC Agent	Declarative	Processes Infrastructure as Code stacks
Healing Agent	Reliability	Monitors and auto-heals infrastructure issues

## How Agents Work

State Change Detection: etcd watchers notify agents when desired or actual state changes
Reasoning: The agent uses an LLM to analyze the state delta and generate an execution plan
Risk Classification: Each planned action is classified as Safe, Moderate, or Dangerous
Execution: Tools are invoked sequentially to carry out the plan
Audit: Every decision and action is recorded in the audit log

Key Capabilities

Reconciliation: Continuously converge actual state to desired state
Self-Healing: Automatically detect and resolve infrastructure issues
Approval Workflows: Dangerous operations require human approval
Tool Use: Each agent has domain-specific tools for interacting with providers
LLM Routing: Tasks are routed to appropriate model sizes based on complexity