You’ve decided your business needs schemas. You understand the concept: structured JSON definitions that AI agents validate against. Now what? You open a blank file and realize you have no idea where to start.

This guide walks through schema design from zero to production. No theory. Concrete steps, real patterns, and a complete example you can adapt.

Step 1: Identify Your Core Entities

Before writing a single line of JSON, list the nouns your business operates on. Not every noun: the 5 that matter most.

For most organizations, this list is obvious once you say it out loud:

  • Customer (or client, patient, member, tenant)
  • Order (or transaction, request, booking, case)
  • Product (or service, offering, SKU, plan)
  • Employee (or team member, agent, provider)
  • One domain-specific entity: the thing that makes your business yours (property, shipment, claim, recipe, campaign)

Write these five down. These are your first schemas.

If you’re struggling to pick five, ask: “If an AI agent was going to run our core operation for one day, what five things would it need to understand?” Those are your entities.

Step 2: Define Fields and Types

Take your first entity (customer is usually the best starting point) and list every field that matters for daily operations. Not every field that exists in your CRM. The fields that a decision-maker actually looks at.

For each field, define three things:

  1. Name: Use snake_case, descriptive, no abbreviations. annual_revenue not ar. primary_contact_email not pce.
  2. Type: string, number, boolean, array, or object. Most fields are strings or numbers.
  3. Description: One sentence explaining what this field means in your business context. This is what AI agents read to understand intent.

Here’s a real customer entity schema with 10 fields: enough to be useful, few enough to maintain:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Customer",
  "description": "A business customer with segmentation, lifecycle tracking, and relationship metadata.",
  "type": "object",
  "properties": {
    "customer_id": {
      "type": "string",
      "pattern": "^CUS-[0-9]{6}$",
      "description": "Unique identifier, format CUS-NNNNNN"
    },
    "legal_name": {
      "type": "string",
      "minLength": 1,
      "description": "Full legal entity name as registered"
    },
    "segment": {
      "type": "string",
      "enum": ["enterprise", "mid-market", "smb"],
      "description": "Customer tier based on annual contract value: enterprise ($500K+), mid-market ($50K-$500K), smb (under $50K)"
    },
    "industry": {
      "type": "string",
      "description": "Primary industry vertical (e.g., healthcare, fintech, manufacturing)"
    },
    "annual_contract_value": {
      "type": "number",
      "minimum": 0,
      "description": "Current annual contract value in USD"
    },
    "primary_contact_email": {
      "type": "string",
      "format": "email",
      "description": "Main point of contact for day-to-day communication"
    },
    "account_owner": {
      "type": "string",
      "description": "Internal account manager responsible for this customer"
    },
    "lifecycle_status": {
      "type": "string",
      "enum": ["prospect", "onboarding", "active", "at-risk", "churned"],
      "description": "Current position in the customer lifecycle"
    },
    "onboarding_date": {
      "type": "string",
      "format": "date",
      "description": "Date onboarding began, ISO 8601 format"
    },
    "tags": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Freeform tags for filtering and grouping (e.g., pilot-program, strategic-account)"
    }
  },
  "required": ["customer_id", "legal_name", "segment", "primary_contact_email", "lifecycle_status"]
}

Notice a few patterns in this schema.

Required fields are minimal. Only five of ten fields are required. A customer record needs an ID, a name, a segment, a contact, and a status. Everything else can be filled in later. This keeps data entry friction low while maintaining a useful minimum.

Enum fields encode business vocabulary. The segment field doesn’t accept any string: it accepts exactly three values, each tied to a revenue threshold documented in the description. The lifecycle_status field defines five specific stages. These enums are your controlled vocabulary. When an agent sees lifecycle_status: "at-risk", it knows exactly what that means.

Descriptions carry business context. The segment description includes the revenue thresholds. The customer_id description includes the format pattern. These descriptions are what separate a storage schema from a business schema. They encode the operational knowledge that agents need.

The pattern constraint validates structure. customer_id must match CUS-NNNNNN. An agent can generate valid IDs and catch malformed ones without any additional logic.

Step 3: Add Relationships

Entities don’t exist in isolation. Customers place orders. Orders contain products. Employees manage accounts. These relationships matter, they’re how agents reason across your business.

JSON Schema handles relationships through $ref references and structured fields. The simplest approach: reference related entities by their ID field.

{
  "account_owner": {
    "type": "string",
    "description": "Employee ID (EMP-NNNNNN) of the assigned account manager"
  }
}

This tells an agent that account_owner isn’t just a name; it’s a reference to an employee entity. The agent can look up the employee schema, find the record, and reason about the relationship.

For richer relationships, use nested objects:

{
  "primary_contact": {
    "type": "object",
    "properties": {
      "name": { "type": "string" },
      "email": { "type": "string", "format": "email" },
      "phone": { "type": "string" },
      "role": { "type": "string" }
    },
    "required": ["name", "email"]
  }
}

And for one-to-many relationships, use arrays:

{
  "line_items": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "product_id": { "type": "string" },
        "quantity": { "type": "integer", "minimum": 1 },
        "unit_price": { "type": "number", "minimum": 0 }
      },
      "required": ["product_id", "quantity", "unit_price"]
    },
    "minItems": 1
  }
}

Start with ID-based references. Move to nested objects and arrays when agents need to reason about the relationship details without a separate lookup.

Step 4: Add Constraints That Encode Business Logic

The power of schemas isn’t in defining fields; it’s in defining rules. Constraints turn a data model into a business model.

JSON Schema provides several constraint types:

Value boundaries. minimum, maximum, minLength, maxLength. A discount percentage can’t exceed 100. An order quantity must be at least 1. A customer name can’t be empty.

Pattern matching. pattern with regular expressions. Phone numbers, ID formats, postal codes. If your business has a specific format for something, encode it.

Conditional logic. if/then rules that encode business-specific conditions:

{
  "if": {
    "properties": { "segment": { "const": "enterprise" } }
  },
  "then": {
    "required": ["account_owner", "annual_contract_value"]
  }
}

This says: if a customer is enterprise, then account_owner and annual_contract_value become required fields. SMB customers don’t need a named account owner. Enterprise customers do. That rule is now in the schema, not in someone’s head.

Enumerated values. enum for controlled vocabularies. Every field that has a finite set of valid values should use enum. This prevents the data quality problems that plague unstructured systems, where “Active” vs “active” vs “ACTIVE” vs “live” all becoming different states.

Step 5: Iterate from Simple to Rich

Schema design is iterative. Here’s the progression that works in practice, across the client engagements we’ve run at NimbleBrain:

Week 1: Minimum viable schemas. 5 entity schemas, 5-8 fields each, basic types and required fields. Deploy to agents. The goal is to get agents validating against real schemas as fast as possible.

Week 2-3: Add constraints. Enums, patterns, conditional logic. This is where business rules start getting encoded. Watch agent validation errors. They tell you where your constraints are too tight or too loose.

Week 4: Add relationships. Connect entities through references. Add process schemas that describe workflows spanning multiple entities. This is where agents start reasoning across your business, not just validating individual records.

Month 2+: Refine based on agent behavior. Fields that are never populated get removed. Missing fields get added. Descriptions get sharpened. Constraints get adjusted. This is the Business-as-Code loop: your schemas improve because agents test them every day.

The teams that struggle are the ones that try to design perfect schemas before deploying anything. They spend eight weeks modeling every edge case. By the time they deploy, the business has changed and half the fields are wrong.

The teams that succeed deploy simple schemas fast and let agent behavior drive refinement. Context Engineering is an empirical practice, not a theoretical one. You learn what your schemas need by watching agents use them.

Schema Design Checklist

Before deploying a new schema, run through this list:

  • Every field has a description that explains business meaning, not just data type
  • required contains only truly required fields (fewer is better at first)
  • String fields with fixed options use enum
  • Numeric fields have minimum and/or maximum where appropriate
  • ID fields use pattern to enforce format
  • Related entities are referenced by ID with a clear description
  • The schema has been reviewed by someone who does the work (not just someone who manages it)
  • The schema is in version control with a meaningful commit message

Store your schemas where agents can access them. Version them in git. Publish them to a shared location your agent stack can read. We use schemas.nimblebrain.ai for ours: the same schemas our agents validate against in production.

The schema isn’t the final artifact. It’s the starting point. Deploy it, watch agents use it, and let the data tell you what to change next.

Frequently Asked Questions

How many schemas should I start with?

Start with 5 core entity schemas: typically customer, order, product, employee, and one domain-specific entity. Add 2-3 process schemas once the entities are stable. Most organizations reach 15-25 schemas within the first quarter.

What's the biggest mistake teams make when designing schemas?

Over-engineering on the first pass. Teams try to capture every edge case and optional field before deploying anything. The result is schemas that are complete on paper but never get validated by agents in production. Start with the minimum viable schema (5-8 fields, only truly required fields marked required) and iterate based on real agent behavior.

How do I handle fields that change frequently, like pricing tiers or status values?

Use enum arrays for controlled vocabularies and update them through version-controlled schema changes. When a new status or tier is added, it's a schema change that gets reviewed and deployed like any other code change. This creates an audit trail and forces the team to be deliberate about structural changes.

Ready to encode your business
for AI?

Or email directly: hello@nimblebrain.ai