AI Vendor Lock-In: The Risks You're Not Seeing

On a Tuesday in 2025, over a million developers woke up to discover their primary coding tool had stopped working. Windsurf, formerly Codeium, had built their entire AI coding assistant on Anthropic’s Claude API. When Anthropic restricted their access, reportedly over competitive concerns, Windsurf had no fallback. No secondary provider. No abstraction layer. No way to swap the underlying model. One vendor decision, made without warning, broke a product used by a million people overnight.

The Windsurf incident wasn’t a freak event. It was the predictable consequence of architectural dependence on a single AI vendor. And it’s a preview of what happens to every organization that builds critical infrastructure on proprietary AI platforms without understanding the risks they’re accumulating.

The Three Categories of AI Lock-In Risk

Vendor lock-in in AI isn’t just about pricing. Traditional software lock-in was mostly economic: switching costs in time and money. AI lock-in adds dimensions that make it deeper and harder to escape.

1. Data Portability Risk

Your AI agents accumulate knowledge. They learn your workflows, your exceptions, your edge cases. They build context about your customers, your processes, and your organizational patterns. The question is: who owns that accumulated knowledge, and can you move it?

With proprietary agent platforms, the answer is often buried in the fine print. Agent memory stored in a vendor’s proprietary format isn’t portable. Workflow configurations built in a vendor’s visual editor don’t export to anything else. Custom training data uploaded to a vendor’s platform may be contractually yours, but technically inaccessible in a usable form.

The test is simple: if you canceled your vendor contract today, could you take your agent’s accumulated context (every workflow, every decision rule, every learned exception) and deploy it on a different platform by next month? If the answer is “no” or “it would take six months of migration work,” your vendor owns your operational knowledge. You’re renting access to your own business logic.

Business-as-Code solves this by design. Schemas, skills, and context files are plain text in a git repository. They’re not stored in a vendor’s database. They’re not encoded in a proprietary format. They move anywhere because they’re files, not features. Your business knowledge lives in your version control, not in someone else’s cloud.

2. Platform Risk

Platform risk is what happens when the vendor changes the rules. API pricing increases. Features get deprecated. Rate limits tighten. Business models pivot. Companies get acquired. Acquirers “sunset” the product you depend on. Every one of these scenarios has happened in AI within the past 18 months.

The AI industry moves faster than any previous technology cycle. The model provider that dominated six months ago may be a secondary player today. The agent framework that had momentum in January may be abandoned by July. The startup that raised a $100M round may be acqui-hired before they ship v2. Building critical business infrastructure on any single player in this environment is a bet on stability in a fundamentally unstable market.

The Windsurf incident is the most dramatic example, but subtler forms of platform risk play out constantly. An LLM provider changes their API response format. A tool vendor deprecates the endpoint your agents depend on. A SaaS platform modifies their OAuth scopes and breaks your MCP server’s access. Each of these is survivable if your architecture is modular and your dependencies are swappable. Each is catastrophic if you’re locked in.

The mitigation is architectural: build on open protocols (MCP), use abstraction layers between your agents and their model providers, and keep your domain logic separate from any vendor’s runtime. NimbleBrain designs every client engagement this way, not because we expect vendors to fail, but because we’ve watched what happens when they do.

3. Exit Cost Risk

Exit costs are the full price of leaving a vendor: not just the subscription you stop paying, but the engineering time to rebuild, the operational disruption during migration, the retraining of AI models on a new platform, and the lost customizations that can’t be exported.

In traditional SaaS, exit costs are measured in weeks of migration. In AI, they can be measured in months of rebuilding. If your agent’s decision-making logic is embedded in a vendor’s proprietary orchestration engine, you’re rewriting it from scratch. If your workflows were built in a visual no-code editor, you’re re-implementing them in code. If your agent’s memory and context are stored in a vendor’s proprietary vector database, you’re re-collecting and re-embedding all of it.

The compounding factor is that exit costs grow with time. The more your organization uses a platform, the more customization accumulates, the more integrations depend on vendor-specific APIs, and the more context is stored in non-portable formats. Month one, switching is inconvenient. Month twelve, it’s a major project. Month thirty-six, it’s an existential question about whether the business can absorb the disruption.

The Risk Assessment Framework

Every organization should evaluate their AI vendor dependencies using these four questions. Each “no” is a risk you’re carrying.

Can you swap your LLM provider in under a week? If your agents are hardcoded to a specific model’s API without an abstraction layer, a single provider outage or policy change breaks everything. Model providers should be interchangeable: different models for different tasks, different providers for redundancy.

Can you export your agent’s accumulated context? If your agent’s memory, learned behaviors, and domain knowledge are stored in a vendor’s proprietary format, you’re dependent on that vendor for access to your own operational intelligence. Context should live in portable artifacts (files, schemas, version control), not in vendor databases.

Can you run your agent infrastructure on different compute? If your agents only run in a specific vendor’s cloud, you have infrastructure lock-in on top of software lock-in. Agent runtimes should be containerized and deployable on any Kubernetes cluster: your cloud, your data center, or any other provider.

Can you replace any single vendor without rebuilding your agent architecture? This is the ultimate test. If removing one vendor from your stack means rebuilding from scratch, that vendor has structural power over your business. A healthy architecture lets you swap any component (model provider, hosting platform, individual MCP servers, orchestration layer) without cascading rewrites.

Score yourself: four “yes” answers means low lock-in risk. Two or more “no” answers means you’re accumulating dependency that will be expensive to unwind.

How Lock-In Accumulates

Nobody chooses lock-in deliberately. It accumulates through reasonable decisions made under time pressure.

The first decision is usually speed. “We need an agent prototype in two weeks. Vendor X has a fully managed platform. Let’s start there.” Reasonable. The prototype works. The pilot succeeds. The team is excited.

The second decision is convenience. “Vendor X has a built-in workflow editor. Let’s build our processes there instead of writing code.” Reasonable. The workflows ship faster. The business team can modify them without engineering tickets.

The third decision is momentum. “We have 40 workflows in Vendor X’s platform now. Rewriting them for a different platform would take months. Let’s just keep building here.” Reasonable, but this is the moment where lock-in becomes structural. The switching cost is no longer inconvenient. It’s prohibitive.

The fourth decision is the one you don’t get to make. Vendor X raises prices by 60%. Or gets acquired. Or deprecates the feature your most critical workflow depends on. Now you discover the real cost of every previous “reasonable” decision.

The Anti-Consultancy approach inverts this pattern. Every architectural decision starts with the question: “Does this increase or decrease the client’s ability to operate independently?” Using a vendor’s API through an open protocol (MCP) is fine: low lock-in, easy to swap. Building custom workflows in a vendor’s proprietary editor is a flag. Storing agent context exclusively in a vendor’s database is a hard stop. The goal is Escape Velocity, the point where the client’s AI operations run independently of any single vendor, including NimbleBrain.

Building for Portability

The architecture that avoids lock-in isn’t exotic. It’s the same principles that good engineering has followed for decades, applied to AI infrastructure.

Open protocols over proprietary APIs. MCP is an open standard for agent-to-tool communication. An MCP server that connects your agent to Salesforce works with any MCP-compatible client. Swap the agent runtime, swap the model provider, swap the hosting. The MCP server still works. Contrast this with a proprietary “connector” that only runs inside one vendor’s platform.

Portable artifacts over platform-stored config. Business-as-Code means your agent’s knowledge lives in files: schemas that define your domain, skills that encode your processes, context that captures your organizational knowledge. These files live in git. They move to any platform. They don’t depend on any vendor’s storage. If your vendor disappears tomorrow, your business logic survives intact.

Containerized runtimes over managed-only services. Agent runtimes should run as containers on Kubernetes. This gives you deployment flexibility: run on AWS, GCP, Azure, bare metal, or a mix. The NimbleBrain Platform is designed to deploy wherever the client needs: managed when they want operational simplicity, self-hosted when they need infrastructure control. The same artifacts work in both environments because the architecture doesn’t assume a specific hosting provider.

Abstracted model access over direct API calls. Your agents should be able to use different models for different tasks and switch providers without code changes. The model layer should be a configuration choice, not an architectural commitment. Today’s leading model may not be tomorrow’s best option for your workload. The ability to switch, quickly and without rebuilding, is a competitive advantage.

The organizations that treat vendor independence as an architectural requirement from day one avoid the compounding lock-in that makes migration catastrophic. The organizations that optimize for speed without considering exit costs learn the price of lock-in at the worst possible moment: when they need to leave and can’t afford to.

Frequently Asked Questions

What was the Windsurf incident?

Windsurf (formerly Codeium) built their entire product on Anthropic's Claude API. When Anthropic restricted their API access, 1M+ developers lost functionality overnight. Windsurf had no fallback. Their architecture was locked to a single provider. It's the clearest example of AI platform risk in 2025-2026.

How do I assess my current lock-in risk?

Three questions: Could you switch your LLM provider in under a week? Could you move your agent context and skills to a different platform? Could you replace any single vendor without rebuilding your agent architecture? If any answer is no, you have lock-in risk. The more no's, the higher the risk.

Doesn't using any vendor create some lock-in?

Yes, but the question is degree. Using a vendor's API through a standard protocol (MCP) is low lock-in; you can swap the underlying connection. Building on a vendor's proprietary agent framework is high lock-in; you'd have to rewrite everything. Business-as-Code reduces lock-in because your domain knowledge lives in portable schemas and skills, not in vendor-specific formats.

Mat Goldsborough·Founder & CEO, NimbleBrain