2026-07-04 · architecture · evidence and expectations

Crawling Is for Discovery. Commerce Needs a Contract.

The web is not replacing pages with APIs. It is adding a second interface for software that must query, verify and sometimes act.

TL;DR — The web is not replacing pages with APIs. It is adding a second interface for software that must query, verify and sometimes act. Crawling remains essential for open discovery and context. Feeds and indexes distribute candidates. Merchant-controlled interfaces verify selected facts. Checkout creates the binding commitment. The transition is real, but neither universal nor guaranteed to remain open.

Machine access is becoming explicit

On July 1, Cloudflare marked one year since its first “Content Independence Day” with a new set of controls for AI traffic.

Starting September 15, new domains on its network will allow search crawlers by default while blocking training and agent crawlers on advertising-supported pages. Cloudflare says more than half of the AI crawler requests it observes retrieve pages that have not changed. It is also evolving Pay Per Crawl into Pay Per Use and building a Monetization Gateway around x402, an open protocol that uses HTTP’s long-reserved 402 Payment Required response to charge for resources including data, APIs and MCP tools.

These are Cloudflare’s measurements and product decisions, not a neutral census of the entire Internet. But they reveal an important change: machine access to the web is becoming explicit, classified and negotiable.

The immediate argument is about content and compensation. Underneath it is a broader technical question:

When software needs to act on information rather than merely read it, is crawling still the right interface?

For discovery, often yes.

For authoritative commerce operations, usually not.

Crawling solved a real problem

Calling crawling obsolete would misunderstand why the web succeeded.

A crawler can discover resources that were never registered in a central database and whose owners signed no integration agreement. It can traverse links, interpret documents and build an index across millions of independently operated sites.

That is not a workaround. It is one of the web’s foundational capabilities.

Crawling remains useful for:

A product page also contains information that does not belong in an inventory database: positioning, photography, explanation, comparison, proof and persuasion. Reducing all of that to data smeared across a layout would miss half the value of the page.

The problem begins when software treats that presentation layer as an authoritative operational contract.

A rendered page may contain several prices: current, regular, promotional, historical or variation-specific. Availability may refer to the parent product while individual sizes are unavailable. A JavaScript component may load fresh inventory after the original HTML was generated. Structured markup, visible text and a product feed may disagree.

A human can notice ambiguity and investigate. Software must choose a value and continue.

For a search index, an imperfect interpretation may still be useful. For an agent about to promise a price or place an order, the same ambiguity becomes a failure condition.

The web already has more than one machine interface

The transition is not simply from pages to APIs. Ecommerce has used several machine-facing layers for years:

  1. HTML pages for people and open-web discovery.
  2. Structured markup embedded in those pages.
  3. Product feeds for distribution to search engines and marketplaces.
  4. APIs for selective retrieval and operational integrations.
  5. Checkout and payment systems for transactions.

Agentic commerce is increasing the pressure to connect these layers coherently. It is not making the earlier ones disappear.

LayerPrimary jobImportant limitation
Crawled pageDiscovery, context and persuasionCommercial facts must be reconstructed
Feed or indexDistribution and candidate generationIt represents a snapshot
Catalog querySelective structured retrievalStructure does not guarantee truth
Detail or quoteMerchant-authoritative verificationRequires an explicit freshness contract
CheckoutTransaction and state transitionRequires identity, consent and failure semantics

The architectural change is therefore not “stop crawling.” It is:

Use crawling to discover and understand. Use explicit contracts to verify and act.

Why action changes the requirements

J.P. Morgan describes the evolution of agentic commerce in three stages: early systems concentrated on discovery, later agents navigated merchant websites to complete guest checkout, and current protocols are moving toward direct integrations between agents and merchants.

Browser automation can complete some purchases. It is also forced to infer operational meaning from an interface designed for people.

A browser may find a button labelled “Buy now.” That does not give the agent an explicit contract for:

Those capabilities may exist behind the page, but they are not defined by the page itself.

Direct protocols make the state transitions explicit. A transactional system can expose separate operations for quoting, reservation and order creation, each with identifiers, expiration, idempotency and defined failure responses.

These operations are not necessarily one atomic call. In distributed commerce they are normally a state machine.

That distinction matters more than whether the agent communicates through MCP, UCP, ACP, REST or another transport. The protocol carries the interaction; the commercial contract defines what each state means.

What OpenAI’s checkout change actually showed

OpenAI launched Instant Checkout in September 2025 using the Agentic Commerce Protocol, co-developed with Stripe. It was a direct merchant integration, not a crawler filling out arbitrary checkout forms.

In March 2026, OpenAI said the initial version did not provide the flexibility it wanted. It shifted its main emphasis toward product discovery and merchant-owned checkout, allowing shoppers to complete purchases through the merchant’s experience. Instant Checkout has not disappeared entirely: OpenAI’s current documentation still describes it for some eligible products and merchants.

That change does not prove that agentic checkout is impossible, nor does it prove that crawling caused weak conversion.

It demonstrates something narrower: owning the final transaction involves more than technically transferring an order. Merchant identity, cart behaviour, loyalty, fulfilment options, trust and post-purchase support remain material parts of the buying experience.

A Walmart executive was reported as saying purchases completed inside ChatGPT converted at roughly one-third the rate of shoppers sent to Walmart’s own site. That is useful evidence about one implementation and channel. It is not a controlled experiment isolating the underlying protocol.

The defensible conclusion is not “agent checkout failed.” It is:

Discovery, verification and checkout mature at different speeds, and the best interface for one stage may not be the best interface for the next.

What makes a catalogue computable

“Machine-readable” is too broad to describe what an acting agent needs. HTML, JSON-LD and a CSV feed are all machine-readable.

A computable catalogue adds explicit operational semantics.

It is queryable

An agent can request a bounded result set using documented filters rather than downloading an entire catalogue or parsing arbitrary pages.

category = mens_tshirts
price <= 30 EUR
availability = in_stock
fields = id,name,price,url

The interface states which filters are supported, how pagination works and whether the result is complete.

Its semantics are bounded

Values such as availability come from a declared vocabulary. Money includes amount and currency. Variants have stable identifiers and explicit relationships to their parent product.

“Available” cannot silently mean parent-level availability in one response and variation-level stock in another.

It exposes provenance

The response identifies the merchant, product and source from which the commercial fact was obtained.

An illustrative response might contain:

{
  "product_id": "wc_123",
  "merchant": "example-store.com",
  "price": {
    "amount": "29.90",
    "currency": "EUR"
  },
  "availability": {
    "status": "in_stock",
    "scope": "variation"
  },
  "observed_at": "2026-07-04T14:30:00Z",
  "product_url": "https://example-store.com/product/example-tshirt/"
}

This is not a proposed standard. It illustrates the contract: a consumer can tell what was observed, where it came from and what the availability statement applies to.

It distinguishes authority from freshness

An API response is not true merely because it came from an API. Merchant databases contain errors. Caches become stale. Plugins disagree. Inventory integrations fail.

Authority answers: Which system is entitled to state this fact?

Freshness answers: When was the fact observed, and for how long may it be relied upon?

A defensible interface must address both.

It supports verification

A compact search result is useful for ranking candidates. Before presenting a final recommendation, an agent may need a more detailed representation containing variants, stock scope, shipping constraints and provenance.

The important operation is not merely search. It is the transition from candidate discovery to merchant-authoritative verification.

It connects to an explicit transaction boundary

The catalogue itself does not have to process payments.

It should, however, identify the merchant-controlled action that follows: product page, quote, cart session or checkout endpoint. If a protocol supports transactions, the states and responsibilities must be declared rather than inferred from interface behaviour.

Feeds are not obsolete

Calling feeds failed APIs would be a mistake.

Feeds are efficient distribution mechanisms. They let merchants publish complete snapshots, allow platforms to build indexes and make cross-merchant retrieval fast. For large-scale candidate generation, querying every merchant live would be slower, more expensive and less reliable than using an index.

The limitation is narrower: a feed alone may not be sufficient to make a time-sensitive commitment.

A practical architecture uses both:

Feed or index → find likely candidates
Merchant detail → verify selected candidates
Quote or checkout → commit to a transaction

The index does not need to pretend that every value is live. The detail interface does not need to answer every broad discovery query. Each layer can do the job for which it is suited.

That separation also makes failure manageable. If one merchant is temporarily unavailable, a federated search can still return other candidates and mark the unavailable result as unverified rather than blocking the entire request.

Live verification does not mean live origin polling

A merchant-authoritative verification step does not require every agent to execute an uncached WooCommerce query for every candidate.

A scalable retrieval path separates broad discovery from narrow verification:

Feed or index
    → candidate IDs
    → batch lookup for a shortlist
    → detail verification for the selected product
    → checkout-authoritative validation

Only a small fraction of discovered products should reach the final verification step.

The merchant can serve that step through a purpose-built read model backed by indexed tables, a read replica, Redis or an edge cache. Conditional requests, bounded TTLs, batch lookup, request coalescing and stale-while-revalidate can reduce origin load further. These are implementation choices, not guarantees supplied automatically by a commerce protocol.

UCP makes an important distinction: catalog pricing and availability reflect the business’s current terms for the request, but they are not transactional commitments. Checkout remains authoritative, and catalog responses should not be reused across sessions without revalidation.

A freshness contract therefore should not claim that a value is infinitely live. It should state:

Traffic control remains the server’s responsibility. Authentication, signed requests, quotas, batch limits, 429 Too Many Requests, Retry-After, CDN enforcement and per-platform rate policies are still necessary. UCP provides signals that can support authorization and abuse prevention; it does not make abusive clients compliant by definition.

Freshness is a contract about bounded reuse, not a promise to hit the production database on every call.

Federation is a system, not a protocol feature

A shared protocol makes federation possible. It does not create federation automatically.

Allowing one user request to span multiple merchants requires:

The user may experience this as one query. Behind that query sits an orchestrator, registry or index coordinating many independent systems.

This is where a computable catalogue becomes economically interesting. A merchant does not need a bespoke integration with every agent if it can expose one well-defined contract that multiple authorized systems understand.

But the result will not necessarily be one universal, open catalogue.

What the new protocols prove — and what they do not

Several important building blocks now exist.

Google launched the Universal Commerce Protocol as an open standard co-developed with retailers and commerce platforms. Its newer optional Catalog capability allows agents to retrieve selected current product details such as variants, inventory and pricing.

OpenAI and Stripe developed the Agentic Commerce Protocol for interactions between agents and merchant systems.

Payment networks are developing the identity, tokenization and authorization layers needed for agent-initiated transactions. Visa reports controlled real-world agent transactions. Mastercard has documented a live production transaction in Europe and additional transactions in other markets. American Express has announced an Agentic Commerce developer kit and purchase-protection model, with some functionality still under development.

These developments prove that direct, protocol-based agent commerce is technically real.

They do not yet prove:

The current evidence is stronger for AI-assisted discovery than for autonomous purchasing. Adobe measured AI-referred traffic to US retail sites growing 393% year over year in the first quarter of 2026. That measures referral traffic, not autonomous transactions.

J.P. Morgan expects agent commerce to develop first around repeatable, lower-risk purchases before moving into higher-value categories. That is a forecast, not yet an observed universal behaviour.

The direction is visible. Its speed and final shape are not settled.

Machine payments make access programmable, not automatically profitable

Cloudflare’s planned Monetization Gateway demonstrates that a site will be able to apply x402 payment policies to pages, datasets, APIs and MCP tools, including prices below one cent.

That creates new options:

It does not prove that agents will pay, that micropayments will outperform free aggregators or that small merchants will benefit equally. Price adds friction, and large platforms may negotiate privileged access or absorb costs that smaller agents cannot.

The defensible claim is narrower: HTTP-native machine payments make access policy economically programmable. The market will decide which resources remain free, which become paid and which buyers are willing to pay.

Open standards do not eliminate concentrated power

It is tempting to predict a clean division between open federation and closed giants. Reality is more complicated.

Walmart, Target, Etsy and other large retailers participated in the development of UCP while continuing to operate proprietary shopping experiences, ranking systems and customer relationships.

That is likely to be the normal pattern: organizations support interoperability where it expands distribution while retaining control over the layers that differentiate them.

Even an open protocol can sit beneath concentrated power.

Control may remain with whoever operates:

The relevant question is not only whether a specification is open. It is whether a merchant can use multiple discovery channels, retain its canonical product identity, keep checkout and customer relationships portable, revoke participation, inspect how its data is represented and move without rebuilding the catalogue integration.

Openness is an operational property, not merely a licence attached to a schema.

Direct protocols reduce ambiguity; they do not eliminate responsibility

The fact that Google, OpenAI, Stripe and payment networks are building commerce protocols is evidence that direct machine contracts are becoming useful. It is not evidence that the resulting market will be decentralized.

UCP supports REST and MCP bindings. ACP defines direct interaction between agent platforms and merchant systems. Both reduce the amount of commercial state that must be inferred from presentation HTML.

But direct communication does not make every claim correct. Cryptographic signatures can establish who sent a message and whether it was modified. They cannot prove that the merchant’s inventory system was accurate.

Nor does keeping the merchant as Merchant of Record remove every obligation from agent platforms, payment providers or marketplaces. It preserves the merchant’s role in accepting the order, processing payment, fulfilment and customer support; other participants retain responsibilities appropriate to their role and jurisdiction.

The real improvement is accountability:

the candidate came from this index
the product state came from this merchant
the price was observed at this time
the checkout accepted these binding terms
the payment was authorized under this mandate

That chain is auditable. It is not infallible.

Trust becomes testable, not automatic

Crawled information often has unclear provenance and freshness. Structured interfaces can improve both, but they do not create truth by construction.

A merchant can publish the wrong price through a flawless JSON schema. A real-time endpoint can faithfully return an inventory error. A structured rating remains only as reliable as its source.

The improvement is that errors become easier to locate and contracts easier to test.

A serious commerce interface can be checked automatically:

Can every search result be retrieved in detail?
Do list and detail prices use the same currency?
Does every advertised size map to a purchasable variation?
Is availability scoped to the product or the variant?
Can the merchant identify when the value was observed?
Does checkout reject an expired quote?
Are retries idempotent?

The useful shift is from inferred trust to auditable claims.

That is less dramatic than “the API answers with truth.” It is also much more valuable.

What changes for smaller merchants

Building all of this from scratch is substantial work. A merchant needs public read models, authentication boundaries, abuse controls, schema governance, observability and reliable synchronization with the commerce platform.

If every small merchant must implement those layers independently, agentic commerce will favour enterprises with dedicated integration teams.

The realistic adoption path is the same one followed by HTTPS, payments and structured data: platforms, plugins and shared gateways absorb most of the complexity and expose a manageable configuration surface.

That outcome is plausible, not guaranteed.

Poorly designed defaults could expose private data. A proprietary gateway could replace crawling costs with platform rent. Conflicting protocols could force merchants to maintain several adapters. A nominally open interface could still be useless without admission to the dominant discovery platform.

The role of a catalogue bridge is therefore not simply to convert WooCommerce into JSON. It is to preserve the merchant’s control while mapping one source catalogue into several bounded channels:

The bridge is valuable only if those channels remain coherent.

The second interface

The web is not abandoning pages.

Humans still need narrative, visual evidence, comparison and trust. Search engines and agents still need open-ended discovery. Crawling will remain part of that system.

What is changing is the boundary between reading and acting.

When software only needs to find a product, a page may be enough.

When it must promise the current price, identify a purchasable size or create an order, it needs an explicit contract: defined semantics, declared provenance, measurable freshness and controlled state transitions.

That contract may be delivered through feeds, indexes, APIs, MCP, UCP, ACP or systems that have not yet been standardized. No single acronym is the architecture.

The durable pattern is the separation of responsibilities:

Discover broadly.
Retrieve selectively.
Verify authoritatively.
Transact explicitly.

Cloudflare’s controls show that machine access to the presentation web is becoming negotiated. Commerce protocols show that direct interaction with merchant systems is becoming practical. Payment-network pilots show that agent-initiated transactions can cross real financial infrastructure. Adobe’s data shows that AI already influences meaningful discovery traffic.

None of that guarantees a universal agentic marketplace.

It does establish that the web is growing a second interface — one that complements the page instead of replacing it.

Crawling opens the first door. A commercial contract opens the second.

Related — For a practical implementation-level audit, read: WooCommerce Can Speak MCP. That Doesn’t Make a Store Shoppable.

← All posts