What to Look for in an AI Implementation Partner

You’ve decided to bring in an outside partner for AI implementation. The vendor market is a mess. Every consultancy, systems integrator, and two-person shop claims AI expertise. The pitches blur together. Everyone promises transformation, acceleration, and ROI. The differences are invisible until you’re six months and $500K into an engagement that isn’t producing production systems.

This checklist cuts through the noise. Five criteria, each with specific questions to ask and red flags that disqualify. Apply them rigorously and the field narrows fast. The right partner will welcome these questions. The wrong one will deflect.

Criterion 1: Engineering Capability

The partner’s team must be able to build production AI systems. Not manage the building of them. Not advise on how they should be built. Build them: write code, deploy infrastructure, configure LLMs, integrate APIs, and debug production issues.

What to look for:

The team proposed for your engagement includes hands-on engineers who write code daily. Their backgrounds include production engineering, not just management consulting. They can speak to specific technical decisions: which LLM for which task, why this orchestration pattern over that one, what the trade-offs are between fine-tuning and retrieval-augmented generation for your use case.

Ask to see code. Not a demo. Not a slide deck explaining their framework. Actual repositories, actual deployments, actual systems running in production. The Anti-Consultancy standard is simple: if they build their own tools, they can build yours. NimbleBrain maintains 21+ open-source MCP servers, an agent framework, and a bundle registry, all used on every engagement. That’s not IP theater. That’s proof of engineering capability.

Questions to ask:

“Who specifically will be writing code on my engagement? Can I see their work?”
“Do you build your own internal tools, or do you use third-party platforms?”
“Walk me through a technical decision you made on a recent engagement and why.”
“If I need to change the agent’s behavior at 9 PM on a Thursday, who do I call and can they fix it?”

Red flags:

“We partner with an engineering firm for implementation.” Translation: they’re a strategy layer on top of the people who actually build. You’re paying for intermediaries.
The team is heavy on project managers and light on engineers. If the ratio of managers to builders exceeds 1:3, the delivery model is management, not engineering.
They can’t show you code. NDAs protect client specifics, not the partner’s ability to demonstrate technical depth. Open-source contributions, technical blog posts, public repositories: these all exist without NDA constraints.
Their “AI practice” was launched in the last 12 months. Building production AI systems takes years of accumulated judgment. A practice that started recently is learning on your engagement.

Criterion 2: Production Experience

Demos are not production. Proofs of concept are not production. A partner who has shipped AI systems to production (handling real data, real users, real edge cases, real uptime requirements) operates differently from one that has built impressive prototypes.

What to look for:

Case studies that describe systems running in production, not pilots that proved feasibility. Specific metrics: uptime percentages, transaction volumes, error rates, user adoption numbers. The language matters. “Deployed to production” is different from “delivered a proof of concept” or “completed the discovery phase.”

Production experience creates a specific kind of judgment. Engineers who’ve shipped know that the demo works on clean data but the production system needs to handle the 3% of records with missing fields. They know that the agent performs well on English text but the client’s customer base includes Spanish-speaking users and the LLM needs different prompting patterns. This judgment doesn’t come from strategy documents. It comes from production scars.

Questions to ask:

“How many AI systems have you deployed to production in the last 12 months?”
“Can you share specific performance metrics from a production deployment?”
“What’s the most common failure mode you’ve seen in production AI, and how do you design around it?”
“What percentage of your engagements reach production deployment?”

Red flags:

Case studies that describe pilots, not deployments. “We built a prototype that demonstrated 40% improvement” is not the same as “We deployed a system that has processed 50,000 transactions.”
No metrics in their case studies. Vague language like “significant efficiency gains” and “improved decision-making” means they don’t have production numbers because there’s no production system.
They’ve never dealt with a production incident in an AI system. If the partner has never debugged a hallucination affecting real users, they haven’t shipped to production.

Criterion 3: Speed

AI implementation should take weeks, not quarters. The technology moves too fast and the real learning happens too early in production for six-month timelines to make sense. A partner who can deliver a working system in 4-8 weeks has a fundamentally different delivery model from one that needs 3-6 months for discovery alone.

What to look for:

Timelines measured in weeks with production milestones, not phases measured in months with document deliverables. A typical NimbleBrain engagement: week 1 is embedded observation and system mapping, week 2 produces a working prototype, weeks 3-4 deliver a production system with knowledge transfer. Four weeks from kickoff to running production AI.

Speed is not about cutting corners. It’s about eliminating the waste that traditional models are built on. No three-month discovery phases that produce documents instead of systems. No weekly steering committee meetings that consume a full day. No sequential phases where strategy must be complete before engineering begins. The discovery, design, and build happen simultaneously because the same people do all three.

Questions to ask:

“What’s your typical timeline from kickoff to production deployment?”
“What will be running in production after four weeks?”
“How long is your discovery phase, and what does it produce?”
“Can you show me a timeline from a recent engagement with dates?”

Red flags:

“We’ll need 3-6 months for the discovery phase.” A discovery phase longer than two weeks means the team is either learning AI on your engagement or following a process designed for ERP implementations.
Milestones defined by documents rather than running systems. “Phase 1 deliverable: requirements document” means you’re paying for documentation, not deployment.
No commitment to a specific production date. “We’ll have a better sense of timeline after discovery” is a blank check.

Criterion 4: Alignment

The engagement structure reveals whether the partner’s incentives align with yours. Fixed scope means the partner is incentivized to ship efficiently. Open-ended time-and-materials means the partner is incentivized to extend.

What to look for:

Fixed-scope, fixed-price engagements with outcome-based success criteria. The contract specifies what will be delivered, when, and what success looks like. The partner’s revenue doesn’t increase if the project takes longer. Scope changes require explicit new agreements, not amendment clauses buried in the contract.

The Anti-Consultancy pricing model is built on this principle. The engagement cost is the engagement cost. If we finish in three weeks instead of four, we don’t bill for the fourth week. If we encounter unexpected complexity, we adapt our approach. We don’t extend the timeline and send a change order.

Questions to ask:

“Is the engagement fixed-scope and fixed-price, or time-and-materials?”
“What happens if the project encounters unexpected complexity?”
“How do you define success for this engagement?”
“What percentage of your engagements extend beyond the original timeline?”

Red flags:

Open-ended time-and-materials contracts with vague milestones. “We’ll deliver value continuously” without specific deliverables is a retention mechanism.
Retainer structures that start after the initial engagement. If the partner’s business model assumes ongoing revenue from your account, their incentive is to build systems that require their ongoing involvement.
Change order clauses that allow scope expansion without a hard stop. A well-written contract separates the original scope from new requests. Each new request is a separate decision, not an automatic extension.

Criterion 5: Independence

The ultimate test: when the engagement ends, can your team operate everything independently? The partner who plans for your independence from day one is fundamentally different from the one who plans for your continued dependence.

What to look for:

You own everything: code, data, infrastructure, documentation, and knowledge. No proprietary platforms. No hosted-only options. No tools that only work with the partner’s support. Your engineers can read the code, understand the architecture, modify the behavior, and deploy changes without calling the partner.

Escape Velocity is the term NimbleBrain uses for this outcome: the point where the client’s team operates the AI systems independently. Every engagement decision (the tools we choose, the architecture we design, the documentation we write) optimizes for this outcome. Knowledge transfer isn’t a final-phase deliverable. It’s built into every week. Your engineers pair with ours from day one so they understand not just what was built, but why.

Questions to ask:

“At the end of the engagement, who owns the code, the data, and the infrastructure?”
“Can my team modify and deploy changes without your involvement?”
“What does your knowledge transfer process look like, and when does it start?”
“Do you use proprietary tools or platforms that require your ongoing support?”
“What’s your plan for my team’s independence?”

Red flags:

Proprietary platforms that only the partner can operate. If the system runs on their infrastructure, controlled by their tools, maintained by their team, you’re a subscriber, not an owner.
No export path. If you can’t take the code, the models, and the data to a different provider or run them yourself, you’re locked in.
Knowledge transfer as a final phase. If the partner saves knowledge transfer for the last two weeks, they’ve spent the entire engagement building dependency. Knowledge transfer should be continuous. Your team should understand the system better every week, not only after the last sprint.
The partner discourages your team from participating in the build. An engaged client team is the partner’s biggest threat if the business model depends on dependency. If the partner wants to keep your engineers away from the code, ask why.

The Meta-Criterion

There’s a question that sits above all five criteria, and it’s the most revealing one you can ask: Does the partner want you to succeed without them?

The right partner plans for their own obsolescence on your account. They build systems your team can own. They transfer knowledge continuously. They use open tools you can maintain. They celebrate when you make your first independent modification without calling them.

The wrong partner plans for permanence. They build systems only they can operate. They hold knowledge as leverage. They use proprietary tools that create switching costs. They position the next engagement before the current one ships.

The difference between these two partners is not talent, not technology, and not methodology. It’s the business model. One model makes money by shipping outcomes and moving on. The other makes money by staying. Choose the one that makes money by leaving you better off.

Frequently Asked Questions

What's the most important criterion?

Engineering capability. If the partner can't build production systems (write code, deploy infrastructure, operate services), they're not an implementation partner. They're a strategy partner. You need someone who ships, not someone who advises on shipping.

How many references should I check?

At least three, and ask the right questions: Did the project ship to production? How long did it take? Can your team operate it independently? Did the partner leave you better off? If the references are all 'they helped us think about AI,' that's not implementation, that's advisory.

Should we choose a specialist or a generalist partner?

Specialist. AI implementation is a specialized discipline that requires production engineering, LLM expertise, and systems integration experience. A generalist digital consultancy that 'also does AI' will struggle with the same structural misalignments described above.

Mat Goldsborough·Founder & CEO, NimbleBrain

Criterion 1: Engineering Capability

Criterion 2: Production Experience

Criterion 3: Speed

Criterion 4: Alignment

Criterion 5: Independence

The Meta-Criterion

Frequently Asked Questions

Ready to put AI agentsto work?

Ready to put AI agents
to work?