Blog/Platform Updates

How We Used SQL on FHIR to Shrink LLM Context by 92%

How we leveraged SQL on FHIR to dramatically reduce token usage in Flexpal, our smart health agent, while maintaining rich clinical context and simplifying our tooling architecture.

November 3, 2025•Larry Ditton
How We Used SQL on FHIR to Shrink LLM Context by 92%

When building Flexpal, our smart health agent, we faced a challenge that everyone working with healthcare data and LLMs eventually encounters: FHIR resources are huge, and when you're trying to help someone understand their medical history through conversation, you can burn through input tokens fast.

LLMs restrict the number of tokens, or input data, you can use when asking the model questions - roughly according to the available memory and model performance. Context is another term of art here and means the same thing. Models have strict limits on the amount of context you can provide.

So we needed a way to intellegently limit the amount of input tokens our agent used, while still providing the rich clinical and financial context that our FHIR claims and clinical records provide.

We tried several approaches before landing on SQL on FHIR, a specification designed for analytical queries on FHIR data. While it's typically used for database queries and analytics, we discovered it's also remarkably effective for transforming verbose FHIR resources into condensed, LLM-friendly summaries.

The results? In one realistic example, a 92% reduction in tokens (from ~240k to ~19k tokens for 14 ExplanationOfBenefit resources), simpler tooling (from many specialized search tools down to 1), and better responses from the AI model.

Here's how we approached the problem and why SQL on FHIR turned out to be the right solution.

Key Considerations

A lot of the consideration for creating an AI experience that uses FHIR data comes down to three main questions:

  1. What data do you need (and do you know this in advance)?
  2. How much context do you have to work with?
  3. What architectural tradeoffs are you willing to make (both in terms of FHIR storage and AI tooling)?

Questions to Ask Yourself

Do you have a need for the complete or nearly-complete patient record? Some use cases may require comprehensive records, while others only need specific data points.

How will you be storing the data you receive? In a database? In your own FHIR store? This decision impacts which slimming strategies are available to you.

Do you know which fields you need? Exactly what data points will your model need to properly generate a response? The more specific you can be, the more effectively you can slim down resources.

How often will you need this data? Does the user only interact with this experience once, a few times, or as many times as they want? If you plan on letting the user continue to have access to this experience in an ongoing manner, you'll want to store the data when you get it in a long-term manner.

How many different tools do you plan to give your model? Do you want something simple that provides a maximum amount of context consistently? Or do you have multiple specialized tools that, when called, may parse the data for specific data points? The former offers less complexity and may allow for faster, better responses due to less tool calls being needed to generate an appropriate answer. The latter trades that simplicity for more focused or specialized tasks that could allow for a smaller number of input tokens.

Approaches to Shrinking FHIR Context

There are several ways one might consider "shrinking the context" of FHIR resources:

  1. Use the _summary search parameter when requesting the data from a FHIR server if it gives you the information you need
  2. Use the _elements search parameter when requesting the data from a FHIR server to request specific fields
  3. Manually parse the resource for the most relevant data and send that as a bespoke object
  4. Create a "slimming" algorithm that preserves FHIR structure
  5. Have specific tools that do "searches" across certain fields and make subsequent tool calls for more relevant resources
  6. Use SQL on FHIR as a way to "query" or shape the resources

Let's examine each option briefly before diving deep on SQL on FHIR.

Option 1: The _summary Search Parameter

The _summary parameter returns only a portion, or subset, of a resource if set to true. According to the spec, this subset should consist solely of all supported elements marked as 'summary' in the base definition.

Pros:

  • Significantly smaller bundles (in our tests, 167KB vs 1.3MB for 14 EOBs)
  • Good for basic financial information
  • Standardized approach

Cons:

  • Requires a FHIR server with _summary support
  • Loses most clinical information when dealing with claims data (no diagnosis codes, procedure details, etc.)
  • Still returns mandatory fields even if you don't need them

Option 2: The _elements Search Parameter

The _elements parameter allows you to request specific elements to be returned.

Pros:

  • More control than _summary
  • Can target specific fields you know you need
  • Still returns FHIR-formatted data

Cons:

  • Requires a FHIR server with _elements support
  • Only works at the base level of resources
  • Still returns mandatory fields
  • Doesn't filter out resources that lack the requested elements

Option 3: Manual Parsing

Write code to extract exactly what you need from FHIR resources.

Pros:

  • Complete control over what's included
  • No dependency on FHIR server capabilities
  • Can be extremely efficient if you know exactly what you need

Cons:

  • FHIR is complex and nuanced—parsing can be difficult
  • Requires maintenance as your needs change
  • Risk of missing important data if parsing logic is incomplete
  • Can require hundreds of lines of code for comprehensive extraction

To illustrate the complexity, here's what parsing just two common fields from an ExplanationOfBenefit resource looks like:

// Example 1: Extract diagnosis codes
function extractDiagnosisCodes(eob: ExplanationOfBenefit): string[] {
  if (!eob.diagnosis || eob.diagnosis.length === 0) {
    return [];
  }

  const codes: string[] = [];

  for (const diagnosis of eob.diagnosis) {
    // Check if diagnosisCodeableConcept exists
    if (!diagnosis.diagnosisCodeableConcept) {
      continue;
    }

    // Navigate the coding array
    const coding = diagnosis.diagnosisCodeableConcept.coding;
    if (!coding || coding.length === 0) {
      continue;
    }

    // Extract codes from each coding
    for (const code of coding) {
      if (code.code) {
        codes.push(code.code);
      }
    }
  }

  // Remove duplicates
  return Array.from(new Set(codes));
}

// Example 2: Extract total benefit amount paid by insurance
function extractTotalBenefit(eob: ExplanationOfBenefit): number | null {
  if (!eob.total || eob.total.length === 0) {
    return null;
  }

  // Find the 'benefit' total by filtering on the coding system and code
  const benefitTotal = eob.total.find((t) => {
    if (!t.category?.coding) {
      return false;
    }

    return t.category.coding.some(
      (c) => c.system === 'http://terminology.hl7.org/CodeSystem/adjudication' && c.code === 'benefit',
    );
  });

  if (!benefitTotal?.amount?.value) {
    return null;
  }

  return benefitTotal.amount.value;
}

That's 50+ lines of TypeScript just to extract two data points, with careful null checking at every level of nesting. Now imagine doing this for:

  • Procedure codes
  • Service/product codes (CPT, HCPCS, NDC)
  • Provider references
  • Care team members
  • Item-level details
  • All the other financial totals (submitted, member liability, deductible, copay)
  • Pharmacy-specific fields (days supply, refill numbers, DAW codes)
  • Institutional-specific fields (type of bill, admission dates, DRG codes)

You'd easily need hundreds of lines of similar parsing logic, all requiring maintenance as your needs evolve or as you discover edge cases in different payers' FHIR implementations.

Option 4: Specialized Search Tools

Create multiple specialized tools that search across specific fields, allowing the AI to make targeted queries.

Pros:

  • Very efficient for specific queries
  • Keeps individual tool responses small
  • Good for targeted data retrieval

Cons:

  • Requires multiple tool calls for comprehensive information
  • Increased complexity in tool orchestration
  • May result in more back-and-forth, increasing overall token usage
  • Harder to maintain consistency across tools
  • Still requires manual parsing (inherits all the complexity from Option 3)

This was actually our initial approach. We created specialized tools where each one parsed claims for specific fields. For example, we had a search_claims_by_diagnosis tool that would parse all EOB resources and return a lightweight array of objects like:

[
  { eob_id: 'eob-123', diagnosis_codes: ['E11.9', 'I10'] },
  { eob_id: 'eob-456', diagnosis_codes: ['J45.40'] },
  // ...
];

The flow would work like this:

  1. User asks: "What claims do I have related to diabetes?"
  2. AI calls search_claims_by_diagnosis tool
  3. Tool returns lightweight objects with just EOB IDs and diagnosis codes
  4. AI identifies relevant claims (those with diabetes diagnosis codes like E11.9)
  5. AI makes a second tool call to get_full_eob_details with the relevant EOB IDs
  6. Now AI has full context to answer the question

While this kept individual tool responses smaller, it meant:

  • We still had to write manual parsing logic for each specialized tool
  • Every query required at least two tool calls (search, then fetch details)
  • We needed separate tools for searching by diagnosis, procedure, provider, date, etc.
  • More tool calls = more tokens consumed in the conversation
  • Complex orchestration logic to ensure the AI used tools in the right sequence

Why We Chose SQL on FHIR

SQL on FHIR is a specification designed to "rectangularize" FHIR data so it's easy to store or query in a manner similar to how you would query a database. It's a bit of a misnomer—SQL isn't necessarily involved, and in fact is not involved unless you want it to be.

How It Works

SQL on FHIR requires two components:

  1. View Definitions: Specifications for how to tabularize the data using FHIRPath expressions
  2. View Runner: System-specific tools that apply the view definition to FHIR resources

Here's a simplified example of what a view definition looks like for ExplanationOfBenefit resources:

export const eobClaimsOverview = {
  resourceType: 'ViewDefinition',
  resource: 'ExplanationOfBenefit',
  name: 'eob-claims-overview',
  status: 'active',
  description: 'Lightweight overview of all EOB types',
  select: [
    {
      column: [
        {
          name: 'id',
          path: 'getResourceKey()',
        },
        {
          name: 'type',
          path: "type.coding.where(system='http://terminology.hl7.org/CodeSystem/claim-type').code.first()",
        },
        {
          name: 'billablePeriod_start',
          path: 'billablePeriod.start',
        },
        {
          name: 'diagnosis_codes',
          path: 'diagnosis.diagnosisCodeableConcept.coding.code.distinct()',
          collection: true,
        },
        {
          name: 'procedure_codes',
          path: 'procedure.procedureCodeableConcept.coding.code.distinct()',
          collection: true,
        },
        {
          name: 'service_codes',
          path: 'item.productOrService.coding.code.distinct()',
          collection: true,
        },
        {
          name: 'total_submitted',
          path: "total.where(category.coding.code='submitted').amount.value.first()",
        },
        {
          name: 'total_benefit',
          path: "total.where(category.coding.code='benefit').amount.value.first()",
        },
        // ... more fields
      ],
    },
  ],
};

This view definition transforms a deeply nested FHIR ExplanationOfBenefit resource into a flat object with just the fields we care about. When run against FHIR resources, it produces clean JSON objects like:

{
  "id": "eob-123",
  "type": "pharmacy",
  "billablePeriod_start": "2024-03-15",
  "diagnosis_codes": ["E11.9", "I10"],
  "procedure_codes": [],
  "service_codes": ["99213"],
  "total_submitted": "250.00",
  "total_benefit": "200.00"
}

Our Implementation

We use Medplum's evalSqlOnFhir function to run our view definitions against locally stored FHIR resources. Here's how it works in our AI tool execution:

// From apps/app/lib/ai-tools.ts
case 'search_claims': {
  // Fetch EOBs from local database
  let eobs = await getResources<ExplanationOfBenefit>('ExplanationOfBenefit');

  // Filter by type, date range if specified
  if (type) {
    eobs = eobs.filter((eob) => {
      const eobType = eob.type?.coding?.find(
        (c) => c.system === 'http://terminology.hl7.org/CodeSystem/claim-type'
      )?.code;
      return eobType === type;
    });
  }

  // Transform using SQL on FHIR ViewDefinition
  const claims = evalSqlOnFhir(eobClaimsOverview, eobs) as EOBClaimOverview[];

  return {
    claims,
    count: claims.length,
  };
}

We created two main view definitions:

  1. eobClaimsOverview: A lightweight summary with diagnosis codes, procedure codes, service codes, provider references, and financial totals. Perfect for initial search and filtering.

  2. carinBbEobDetailsUnified: A comprehensive view with all claim details, supporting info, care team members, and item-level breakdowns. Used when the AI needs full details on specific claims.

Tooling simplification

This led us to simplify our AI tooling dramatically:

Before - many specialized search tools such as:

  • Search by diagnosis
  • Search by procedure
  • Search by provider
  • Search by type
  • Search by medications

After - 1 specialized EOB search tool:

  • search_claims: Returns condensed overview using eobClaimsOverview

This is cleaner organizationally—one tool call maps to one view definition.

The Results

Our quick analysis showed remarkable improvements:

  • Token Reduction: 14 raw EOB resources consumed ~240k tokens. After SQL on FHIR transformation, the same data was ~19k tokens—a 92% reduction.
  • Information Retention: The view definitions preserved much of the fundamental clinical and financial information from the raw FHIR.
  • Code Savings: Implementing comparable parsing logic would have required hundreds of lines of TypeScript. Our view definitions are declarative and maintainable.
  • Simplified Architecture: Reduced from 5 search tools to 1, making the system easier to reason about and maintain.

Why SQL on FHIR Is a Good Fit

Avoids Manual Parsing Complexity

FHIR's nested structure means extracting specific data points requires careful navigation of arrays, optional fields, and complex references. A simple task like "get all diagnosis codes from a claim" becomes:

// Manual parsing - error-prone and verbose
const diagnosisCodes =
  eob.diagnosis?
    .flatMap((d) => d.diagnosisCodeableConcept?.coding?.map((c) => c.code) || [])
    .filter(Boolean) || [];

With SQL on FHIR, this becomes a declarative path expression:

{
  name: 'diagnosis_codes',
  path: 'diagnosis.diagnosisCodeableConcept.coding.code.distinct()',
  collection: true,
}

Maintains Standardization

View definitions are part of the FHIR specification, which means:

  • They're portable across systems
  • They can be validated
  • They're self-documenting
  • Teams can share and collaborate on them

Provides the Right Level of Abstraction

SQL on FHIR sits at a sweet spot:

  • Higher level than manual parsing (more maintainable)
  • Lower level than _summary or _elements (more control)
  • More focused than full FHIR (better for LLM context)
  • Still structured enough to be machine-readable

Evolves With Your Needs

As our understanding of what information matters most to users evolved, we could easily modify our view definitions without rewriting parsing logic. Adding a new field is as simple as adding a new column specification.

The Tradeoffs

SQL on FHIR isn't without its considerations:

Learning Curve: FHIRPath expressions take time to learn. The syntax can be unintuitive at first, especially for complex queries with filtering and transformations.

Another Dependency: You need a view runner implementation. We use Medplum's evalSqlOnFhir, but you could implement your own or use other tools.

Runtime Overhead: Transforming resources at query time adds some computational cost, though in our testing with React Native and SQLite, the overhead was negligible compared to the benefits.

Not a Complete Solution: For very large datasets (hundreds of claims), you'll still want additional strategies like pagination, date filtering, or claim type filtering to keep context manageable.

Summary

SQL on FHIR solved our LLM context problem by providing:

  • Dramatic token reduction (92% in our case)
  • Simplified architecture (fewer, more focused tools)
  • Maintainable transformations (declarative, standardized view definitions)
  • Preserved clinical richness (all the important data, none of the noise)

While SQL on FHIR isn't designed specifically for LLM context reduction, it turns out to be remarkably well-suited for it. The same properties that make it good for analytical queries—focusing on relevant fields, flattening nested structures, and maintaining semantic clarity—also make it excellent for feeding LLMs.

If you're building AI experiences with healthcare data, we'd encourage you to explore SQL on FHIR. It might not be the first tool you think of for this problem, but it could be the right one.


Building AI experiences with healthcare data? We'd love to hear about your approach. Reach out to us or share your thoughts on X. Interested in how Flexpa can help power your healthcare AI applications? Schedule a demo to learn more about accessing real patient health data.

Get fresh insights on patient access

Unsubscribe anytime

Newsletter illustration