Anatomy of a Production AI Agent

Every AI demo looks impressive. A foundation model generates articulate responses, handles follow-up questions, and produces output that feels intelligent. Then the demo ends and the team tries to put it into production. Within days, the system hallucinates, loses context, can’t connect to the tools it needs, and falls apart the first time something unexpected happens.

The demo wasn’t broken. It was incomplete. A production AI agent has four components. Most demos only include one. Here’s what each component does and why skipping any of them puts you in The Pilot Graveyard, the 95% of AI projects that never reach production.

Component 1: The LLM (The Brain)

The Large Language Model is the reasoning engine. It processes language, makes inferences, evaluates options, and generates responses. This is what every demo showcases, and it’s the component that gets the most attention and the least of the blame when things go wrong.

In a production agent, the LLM handles three jobs:

Reasoning. Given a goal and a set of available actions, determine what to do next. This isn’t text generation; it’s decision-making. The agent reads a customer support ticket, evaluates the severity, identifies the relevant policy, and decides whether to resolve it directly, escalate it, or request more information.

Language understanding. Interpreting unstructured inputs (emails, chat messages, documents, form responses) and extracting the intent, entities, and context needed to take action. The customer wrote “this thing is still broken after your last fix.” The LLM understands that “this thing” refers to a previously reported issue and “your last fix” implies a prior interaction that needs to be found.

Generation. Producing the outputs the agent needs: a response to the customer, a summary for the internal team, a formatted entry for the CRM. Generation is the most visible capability, but in a production agent, it’s in service of action, not conversation.

The LLM matters, but it’s table stakes. Claude, GPT-4, Gemini: the frontier models are all capable enough for most production tasks. The differentiator isn’t which model you choose. It’s what you give the model to work with. An LLM without tools, memory, or orchestration is a brain in a jar: it can think, but it can’t do anything.

Component 2: Tools (The Hands)

Tools are how agents take action in the real world. Without them, the agent is a chatbot; it can talk about what should happen but can’t make it happen.

In the NimbleBrain architecture, tools are MCP servers. MCP (Model Context Protocol) is the open standard that connects AI agents to external systems. Each MCP server exposes a specific capability: reading from your CRM, writing to your database, triggering a workflow in Zapier, sending a message in Slack, filing a ticket in Jira.

NimbleBrain has built and published 21+ MCP servers. They connect agents to the systems your business actually runs on. The mpak registry at mpak.dev catalogs available servers with security scanning through the MCP Trust Framework at mpaktrust.org.

Tools give agents hands, but the hands need to be reliable. In production:

Tool reliability matters more than tool breadth. An agent with five tools that work every time outperforms an agent with fifty tools that fail intermittently. Production MCP servers need error handling, retry logic, rate limiting, and authentication management.

Tool composition is how complex work happens. A single tool reads the customer record. Another checks the order history. A third processes the refund. A fourth updates the CRM. A fifth sends the notification. The agent composes these tools into a workflow. No single tool does the job. The agent’s orchestration of multiple tools does.

Tool boundaries define agent capabilities. An agent can only act through its tools. If there’s no MCP server connecting to your ERP, the agent can’t touch inventory data. Scoping tools carefully is both a capability decision and a governance decision, as you control what the agent can do by controlling what tools it has access to.

Component 3: Memory (The Context)

Memory is what makes an agent effective on your specific business, not just on generic tasks. Without it, every interaction starts from zero. The agent doesn’t know your customers, your policies, your exception logic, or what happened yesterday.

Business-as-Code transforms an agent from a generic tool into a domain specialist. Business-as-Code has three layers:

Schemas define what your business IS: JSON Schema definitions of your entities. Customers, orders, products, workflows, approval chains. Each entity has required fields, valid states, and relationships. When an agent reads a schema, it knows exactly what a “customer” means in your organization: segments, lifecycle stages, tier levels, communication preferences. No guessing. No hallucination.

Skills define what your business KNOWS: structured markdown documents encoding domain expertise. This is the concept called Skills-as-Documents: decision logic captured as documents, not compiled code. A skill for refund processing includes the eligibility criteria, the escalation thresholds, the tier-specific rules, and the exception handling paths. A ten-year veteran’s judgment, documented in a format any agent can follow.

Context is the glue: background knowledge that makes schemas and skills coherent. Industry context. Company history. Seasonal patterns. Strategic priorities. Two companies might have identical refund schemas, but context tells the agent that one is a high-growth startup where every customer matters and the other is a volume retailer where operational efficiency matters more.

This is Context Engineering in practice, structuring organizational knowledge so AI agents can operate on it. The result: agents that make decisions aligned with your business, not decisions that sound generically plausible.

NimbleBrain runs on this internally. Every repository has a CLAUDE.md file, a skill that encodes architectural decisions, coding standards, and domain knowledge. The schemas at schemas.nimblebrain.ai are the actual definitions our agents consume. This isn’t a methodology we recommend. It’s how we operate.

Component 4: Orchestration (The Workflow)

Orchestration is how an agent moves from “I have a goal” to “the goal is complete.” Without it, the agent can reason, use tools, and access memory, but has no structure for executing multi-step workflows reliably.

In NimbleBrain’s architecture, orchestration follows the meta-agent pattern: a coordinating agent manages the workflow while domain-specialist sub-agents handle specific tasks. The meta-agent knows the goal, understands the steps, and delegates to specialists: a sales agent for lead qualification, a support agent for ticket triage, an ops agent for process execution.

Production orchestration handles what demos skip:

Step sequencing. Some steps must happen in order. You verify the customer before processing the refund. You check inventory before confirming the order. Orchestration enforces the sequence.

Parallel execution. Other steps can happen simultaneously. While one sub-agent researches the customer history, another checks the current order status, and a third prepares the response template. Orchestration manages concurrency.

Error handling and recovery. The payment API returns an error. The CRM is temporarily down. The customer record is incomplete. Orchestration defines what happens next: retry after a delay, fall back to an alternative, escalate to a human, or abort and notify. Production agents encounter errors constantly. Orchestration makes them recoverable instead of fatal.

Governance boundaries. Some actions require human approval. Refunds over a threshold. Customer communications about legal matters. Changes to production systems. Orchestration enforces these boundaries: the agent works autonomously within defined limits and pauses when it reaches a boundary that requires human judgment.

The Demo Gap

The gap between a demo agent and a production agent is the distance between component 1 and all four.

	Demo Agent	Production Agent
LLM	Yes (impressive responses)	Yes (reasoning in service of action)
Tools	None, or one mock tool	5-15 MCP servers connected to real systems
Memory	Conversation window only	Business-as-Code: schemas, skills, context
Orchestration	Single prompt-response	Meta-agent pattern with error handling and governance
Data	Curated sample data	Real data with inconsistencies and edge cases
Failure handling	Crashes or hallucinates	Retries, escalates, falls back, notifies
Governance	None	Approval boundaries, audit trails, access controls

The demo agent uses Claude or GPT to generate an impressive response about how it would handle a customer refund. The production agent actually processes the refund, looking up the order through the commerce MCP server, verifying eligibility against the refund skill, processing the transaction through the payment MCP server, updating the CRM, and sending the confirmation, while logging every step for audit.

How NimbleBrain Builds Production Agents

Each component maps to a concrete implementation:

LLM. Foundation models: Claude, GPT-4, or the best model for the task. The model is interchangeable. The architecture around it isn’t.

Tools. MCP servers from the mpak registry or custom-built for the client’s stack. Typical engagement: 8-15 MCP servers covering CRM, databases, communication tools, and workflow triggers. Each server is security-scanned through the MCP Trust Framework.

Memory. Business-as-Code artifacts built in the first two weeks of an engagement. Knowledge audit, 10-20 entity schemas, 15-30 domain skills, structured context files. This is the foundation that makes everything else work.

Orchestration. The meta-agent pattern with domain specialists. A coordinating agent manages the workflow. Specialist sub-agents handle their domains. Governance boundaries enforce human-in-the-loop for high-stakes decisions.

The typical result: 8-12 production automations in four weeks. The first two weeks build the foundation (memory and tools). The last two deploy agents on top of it. The Recursive Loop (BUILD, OPERATE, LEARN, BUILD deeper) takes it from there. Each cycle adds schemas, refines skills, and expands what the agents can handle.

Building Your First Production Agent

Start with one workflow. Pick something your team does repeatedly that requires both rules and judgment, like support ticket triage, lead qualification, order exception handling. Then build all four components for that workflow:

Choose the LLM. Any frontier model works. Don’t overthink this.
Connect the tools. Identify the 3-5 systems the workflow touches and deploy MCP servers for each.
Build the memory. Write the schemas for the entities involved and the skills for the decisions required.
Add orchestration. Define the step sequence, error handling, and governance boundaries.

The first agent will be imperfect. The Recursive Loop handles that. Each cycle improves the schemas, adds skill branches, and tightens the orchestration. But unlike a demo agent that impresses and stalls, a production agent with all four components does real work from day one.

Frequently Asked Questions

What makes a production agent different from a demo?

Demos run in sandboxes with curated data and no consequences. Production agents handle real data, real failures, real costs, and real compliance requirements. The difference is governance, error handling, and memory, the components demos skip.

Do all agents need all four components?

Yes. Without the LLM, there's no reasoning. Without tools, there's no action. Without memory, every interaction starts from zero. Without orchestration, there's no reliability. You can start simple, but production requires all four.

How does NimbleBrain build agent components?

The LLM is the foundation model (Claude, GPT, etc.). Tools are MCP servers connecting to your systems. Memory is Business-as-Code: schemas, skills, and context files. Orchestration is the meta-agent pattern coordinating domain specialists.

Mat Goldsborough·Founder & CEO, NimbleBrain