The Production AI Checklist: 20 Things Your Demo Didn't Cover

Your demo worked. The model classified those tickets, drafted those emails, summarized those reports. Everyone in the room was impressed. The pilot got funded.

Now ship it.

That is the moment 95% of AI projects stall. The demo proved capability: the model can do the task. Production requires everything else: the data pipeline that feeds it real information, the integrations that connect it to your actual systems, the governance that satisfies legal and compliance, the monitoring that tells you whether it is working, and the operational knowledge that keeps it running when you are not watching. These 20 items are what separates shipped AI from The Pilot Graveyard. Your demo covered 3-5 of them. Here are the rest.

Data (Items 1-4)

Every AI system is only as good as the data it operates on. The demo ran on curated samples. Production runs on your actual data: messy, incomplete, and contradictory.

1. Real Data Pipeline. A live connection to your production data sources; not a CSV export, not a snapshot, not a sample dataset. The pipeline handles incremental updates, schema changes, and source system outages. If your AI reads from a static export, it is operating on yesterday’s reality. A data pipeline is the difference between a system that knows your current state and one that confidently acts on stale information.

2. Data Quality Monitoring. Automated checks that detect data degradation before it reaches the model. Missing fields, format changes, duplicate records, null value spikes, distribution shifts. When your CRM vendor pushes an update that changes a field format, your data quality monitor catches it before the model starts producing garbage. Without this, you discover data problems through bad outputs, after the damage is done.

3. Data Versioning. The ability to reproduce any model decision using the exact data that was available at the time of that decision. When a customer disputes an AI-generated recommendation in month three, you need to pull the data the model saw; not the data that exists now. Data versioning is a governance requirement masquerading as an engineering task. Skip it and your audit trail has a gap that regulators will find.

4. PII Handling. A defined, enforced policy for how the AI system handles personally identifiable information. Which fields are masked, which are excluded, which require additional consent, what happens when PII appears in unstructured text the model processes. GDPR, CCPA, and HIPAA each have specific requirements. Your demo did not handle PII because it ran on synthetic data. Production handles PII on every interaction.

Integration (Items 5-8)

The demo ran in isolation. Production connects to your entire stack. This is where The Integration Cliff , the vertical wall between standalone capability and system-connected reality.

5. System Connections. Live, authenticated, bidirectional connections to every system the AI reads from or writes to. CRM, ERP, email, messaging, databases, document stores, ticketing systems. Each connection must handle authentication, data mapping, and format translation. MCP provides a standardized protocol for these connections: one interface for any system. But the connections must be real, tested, and maintained. Mocked APIs do not count. “We’ll integrate later” does not count.

6. Authentication and Authorization. The AI system authenticates with every upstream service using proper credentials (OAuth tokens, API keys, service accounts) with automatic refresh, rotation, and revocation. Authorization ensures the AI can only access data and perform actions within defined boundaries. The model should not have admin access to your CRM because it was easier than setting up proper scoping. The Credential Lifecycle (provisioning, rotation, expiry, and revocation) is an operations concern that demos completely ignore.

7. Rate Limit Management. Every API your AI connects to has rate limits. Salesforce, HubSpot, Slack, your internal APIs: all of them cap requests per minute or per hour. The demo never hit these limits because it processed five records. Production processes thousands. Rate limit management means queuing, batching, throttling, and backoff strategies that keep the system functional when it hits the ceiling. Without it, your AI works for the first hundred requests and fails on the hundred-and-first.

8. Error Handling Across Systems. When the CRM times out mid-workflow, what happens? When the email API returns an unexpected format, what happens? When the database connection drops, what happens? Each integration point is a failure point. Error handling means retry logic, circuit breakers, fallback paths, and graceful degradation, all defined for every connection, tested under failure conditions. The demo encountered zero errors because the environment was designed to produce zero errors. Production encounters errors constantly.

Governance (Items 9-12)

This is where pilots go to die. The system works, the team is excited, and then legal review starts. If governance was not built in from the beginning, it cannot be bolted on at the end.

9. Audit Trail. Every action the AI takes (every tool call, every data access, every decision) is logged with enough detail to reproduce the entire reasoning chain. The audit trail captures what data the model saw, what tools it invoked, what parameters it used, and what output it produced. This is not optional logging. It is the record that satisfies regulators, resolves customer disputes, and proves the system behaved correctly. Decision Traceability is the foundation of AI governance: if you cannot trace a decision backward through every input that shaped it, you cannot defend it.

10. Approval Workflows. High-stakes actions require human approval before execution. Financial transactions above a threshold. Customer communications that mention pricing or contractual terms. Data modifications that affect compliance-relevant records. The AI proposes the action. A human reviews and approves or rejects. The threshold for what requires approval is a business decision, but the mechanism for enforcing it is an architecture requirement that must exist before production.

11. Access Controls. Role-based access that defines what the AI can see and do, scoped by user, department, or function. The sales agent sees sales data. The support agent sees support data. Neither sees HR records. Access controls for AI systems follow the same principle as access controls for human users: least privilege. The demo gave the model access to everything because it was easier. Production restricts access because the alternative is a data breach.

12. Compliance Documentation. Written documentation that maps the AI system’s behavior to your regulatory requirements. GDPR Article 22 (automated decision-making). HIPAA if healthcare data is involved. SOC 2 for service organizations. Industry-specific regulations for financial services, insurance, or government. The documentation must be specific: “Here is how we comply with requirement X. Here is the evidence.” Not “we will address compliance in phase two.” Phase two never comes.

Operations (Items 13-16)

A system without operations is a system waiting to fail silently. The demo ran once and succeeded. Production runs continuously and must succeed reliably.

13. Monitoring Dashboard. Real-time visibility into throughput, accuracy, latency, cost, error rates, and escalation rates. The dashboard answers the question “is the system working right now?” without requiring someone to investigate. If the dashboard does not exist, nobody knows the system is degrading until a customer complains, and by then, the damage is done.

14. Alerting. Automated notifications when metrics cross defined thresholds. Accuracy drops below 90%. Error rate exceeds 5%. Latency spikes above SLA. Cost per interaction doubles. Integration goes down. Drift Detection is especially critical; the gradual degradation that happens as business conditions change and the model’s context becomes stale. Alerting catches drift before it produces visible failures. Without alerting, you are running blind.

15. Rollback Procedures. A defined, tested process for reverting the AI system to a known-good state when something goes wrong. Rollback means more than “turn it off.” It means reverting to a previous model version, restoring previous configurations, undoing actions the system took during the degraded period, and communicating the rollback to affected users. If you cannot roll back in under 30 minutes, your incident response window is too long.

16. Incident Response. A documented plan for what happens when the AI system produces a bad outcome. Who gets notified? What is the escalation path? Who has authority to shut down the system? How do you assess the blast radius? How do you communicate to affected users? Incident response for AI systems is different from traditional software because the failures are semantic; the system does not crash, it produces wrong answers. Detecting and responding to semantic failures requires different tools and different judgment than detecting a server error.

Knowledge (Items 17-20)

The most overlooked category. Every item above requires someone who knows what to do. That knowledge must be captured, documented, and transferable.

17. System Documentation. Architecture documentation that explains how the AI system works, what it connects to, how data flows, what decisions it makes, and where to look when something breaks. Not a project plan. Not a requirements document. A technical reference that someone who has never seen the system can use to understand it in 30 minutes. If the system cannot be understood without the person who built it in the room, you have a bus-factor-one problem.

18. Runbooks. Step-by-step operational procedures for common tasks: restarting the system, updating context, adding a new integration, handling specific error types, performing a rollback. Runbooks are the operational knowledge that keeps the system running after the implementation team leaves. The Handoff Problem (the failed transfer of knowledge from builders to operators) kills more AI deployments than technical failure. Runbooks are the solution.

19. Team Training. The people who use and operate the AI system must understand what it does, what it cannot do, when to trust it, when to override it, and how to escalate. Training is not a one-time onboarding session. It is an ongoing program that updates as the system evolves. Untrained operators either over-trust the system (accepting bad outputs) or under-trust it (overriding good outputs), both failure modes.

20. Handoff Plan. A defined timeline and process for transferring full operational ownership from the implementation team to the internal team. The handoff plan specifies knowledge transfer milestones, competency validation, a support escalation path during the transition period, and the criteria for declaring the internal team independent. This is what NimbleBrain calls Escape Velocity; the point where your team can run, maintain, and extend the system without external support. Without a handoff plan, you are permanently dependent on whoever built it.

Using the Checklist

Score your current initiative against all 20 items. For each one, assign a status:

Done: Implemented, tested, and operational.
Planned: Specific timeline, assigned owner, defined scope.
Unplanned: Not yet addressed. Not on the roadmap.

If you have more than 5 items marked “Unplanned,” you are in demo territory, regardless of what your project status report says. The items you have not planned for are the items that will block production.

The pattern we see across engagements: teams have 3-5 items done (model selection, basic prompts, a UI), 3-5 items planned, and 10-14 items unplanned. That 10-14 represents the work that separates a demo from a production system. It represents The Production Gap. And it is the reason NimbleBrain starts every engagement with a production checklist review; not to check boxes, but to surface the gaps before they become six-month delays.

This checklist is not aspirational. It is the minimum. Every item on it represents something that, if missing, will either block your launch, cause a production failure, or create a regulatory exposure. The demo did not need any of them. Production needs all of them.

Frequently Asked Questions

How many of these 20 items does a typical pilot cover?

In our experience, 3-5. Pilots usually cover model selection, basic prompt engineering, and a simple UI. The other 15+ items (data pipelines, integrations, governance, monitoring, rollback, documentation) are 'phase 2' that never happens.

Do we need all 20 before going to production?

Not all at once. But you need a plan for all 20 before you start. Some can be built incrementally (monitoring improves over time. Others are binary) you either have an audit trail or you don't. The checklist helps you identify what's blocking production vs. what can iterate.

Mat Goldsborough·Founder & CEO, NimbleBrain