2026-07-05 · case study · primary data

196,000 tokens or 8,000: making a catalog an agent can read

The summary projection cuts the cost of browsing a real 626-product catalog by 96% — the difference between a catalog an agent can read and one it can't fit.

TL;DR — Returning a fifty-product listing from a real 626-product WooCommerce catalog costs 196,595 tokens in full records and 8,000 in summaries — one twenty-fifth. At 196k a single list overflows the 128k context window most models ship with, so the summary projection isn't a discount — it's what makes the catalog computable at all. Figures counted with tiktoken on Bridge 1.0.114.

An agent shopping a catalog has to read it before it can choose. Not the human page — the data: every candidate product, its price, whether it's in stock, enough to rank and decide. The obvious way to serve that is to hand the agent the complete record for every product it might consider. On a small catalog nobody notices the cost. On a real one, it breaks.

Here is the number that makes the point. Ask the Bridge for fifty products from a live 626-product WooCommerce store, complete records, and the response is 196,595 tokens. Ask for the same fifty as summaries and it is 8,000. Same products, same prices, same truth about what's in stock — one twenty-fifth of the tokens.

That gap is not a rounding error to shave. At 196,000 tokens, a single list response overflows the 128,000-token context window that most deployed models still ship with — the catalog literally does not fit. The complete-record response isn't expensive to browse; it is impossible to browse. The summary projection is what makes the catalog computable at all: the difference between a store an agent can query and one it chokes on before it can choose.

The shape of the waste

A complete product record is right to exist. It carries shipping policy across zones, active coupons, image sets, the variant matrix, purchase-readiness rules, provenance. An agent about to transact needs all of it. An agent deciding which product to look at needs almost none of it.

The problem is that the full record repeats that heavy payload for every product in a list — most of which the agent will discard after a single glance. On the 626-product store, a complete record averages roughly 3,900 tokens per product. The decision itself needs only identity, price, sale status, availability, URL, category, and whether a variant choice is required — about 160 tokens. The other ~3,750 tokens per product are correct, and wasted, on every candidate the agent was never going to pick.

Triage, then verify

The Bridge exposes three projections of the same catalog — summary, verification, and full — and instructs the agent to use them in sequence rather than reaching for the heaviest one by default. Rank every candidate from summary data. Open exactly one — the finalist — in full.

On the same fifty-product list, that composed flow costs about 8,000 tokens to triage all fifty, plus roughly 3,900 to pull the one selected product in full: ~11,900 tokens against 196,595 for handing over the complete list. A 94% reduction end to end — and the agent still finishes with the complete record of the product it actually transacts on. Nothing that matters at the point of purchase is missing; it's simply fetched once, for one product, instead of fifty times for fifty.

The projection is honest

Cheap is easy if you're allowed to hide things. The summary isn't cheap by omission — it's cheap by scope, and it says so.

Every summary response carries an explicit coverage declaration: what the summary is complete for (identity, catalog price, sale status, availability, product URL, category, whether a variant selection is required) and what genuinely requires the detail call (exact sizes, per-variant stock, shipping, coupons, purchase readiness, description, images). The agent is never left guessing whether it has enough to act — the response tells it. Nothing a purchase decision depends on is silently dropped; it is deferred, in the open, to the single full fetch the agent was going to make anyway.

That is the actual design principle, and it's the same one that governs everything else on this surface: the catalog is reported truthfully, at the resolution the current step needs. Not less truth — the right resolution of truth for the task. Triage doesn't need the shipping matrix. Checkout does. Serving the shipping matrix during triage isn't generosity; it's noise that pushes the catalog out of the agent's reach.

It gets better at scale, not worse

The instinct with data-shrinking tricks is that they help least where you need them most. This one is the opposite.

On a small catalog — a three-product search — the summary is already about 6.6× lighter than full. On the fifty-product list it is 24.6×. The ratio widens with scale for two compounding reasons: the full record's per-product weight multiplies down every row of the list, while the summary's stays flat at ~160 tokens; and the small fixed overhead of the guidance block amortizes across more products. The bigger and richer the catalog — precisely the catalogs where an agent most needs to browse before buying — the more decisive the projection becomes.

A catalog an agent cannot fit in its context is not a catalog to that agent; it's a wall. Making a store computable was never only about exposing structured data — it was about exposing it at a resolution the agent can actually consume, one task at a time. The summary projection is the unglamorous half of that work: not a new capability, but the one that decides whether the capabilities already there can be reached at all.

Appendix — the numbers

Method. Token counts computed with tiktoken (cl100k_base) on live API responses from two production WooCommerce stores running KaliCart Bridge 1.0.114, July 2026. No estimates — every figure is a direct token count of the actual bytes an agent would receive.

Measurementsummary / verificationfullreductionratio
Store A (626 products) — 50-product listing8,000196,59595.9%24.6×
— per product1603,931
— composed flow (50 summary + 1 full)~11,932196,59593.9%
Store B — text search "nike" (3 products)5673,72284.8%6.6×
Store B — single variable product612 (verification)1,95268.6%

Endpoints.

Projections.

On the honesty of these numbers. Token counts are tokenizer- and catalog-specific: a different tokenizer, or a merchant whose records are lighter or heavier, will shift the absolute figures. What holds across both stores measured is the direction and the order of magnitude. The composed-flow figure assumes the documented pattern — triage in summary, one full fetch for the finalist; an agent that over-fetches full records will save proportionally less, which is exactly why the pattern is prescribed in the tool instructions rather than left to chance.

← All posts