2026-07-04 · field audit · primary data

Your WooCommerce store works. Your product data doesn't.

Three blind agents exposed the maintenance debt hidden behind a perfectly functional storefront.

TL;DR — A real WooCommerce store could display products, take orders and look perfectly normal to a buyer while its underlying catalog disagreed with itself. Three independent agents found a €300 price conflict, products promoted while out of stock, duplicate records, missing brands and variants trapped in prose. WooCommerce checks whether a product can be published. It does not check whether the catalog still means one coherent thing.

A functioning storefront can hide a broken catalog

Most ecommerce quality checks stop at the visible surface: does the page load, is there a price, does the Add to cart button work, can the customer reach checkout?

Those checks matter, but they answer only whether the store is operational. They do not answer whether its product data is internally consistent.

Humans compensate for inconsistency remarkably well. A shopper can infer that a brand is the first word in a title, understand that a size buried in a paragraph is probably the only available size, or call the store when two prices disagree. Software cannot safely improvise in the same way. It must decide which field and which page is authoritative.

That difference is what three blind agents exposed on a small production WooCommerce store.

The test

We gave three independent agents no prior knowledge, no search-engine access and no hints about the site's implementation. Each received a different assignment on the same store:

  1. Shopper: find three products that could be purchased now and report exact price, stock and options.
  2. Catalog auditor: sample products across categories and measure the completeness of their commercial data.
  3. Discovery auditor: find the shortest authoritative path from the homepage to search and product verification.

The store had 18 public product records. The catalog auditor inspected six products in depth. Every URL request, failure and duplicate was logged.

RunRequests or attemptsWhat it established
Shopping task11Three purchasable products, plus dead links, a timeout and contradictory option text
Technical path4Homepage → discovery → search → exact product verification
Catalog audit64Six detailed records, taxonomy counts, media checks, duplicate requests and intermittent access failures

The difference between four requests and sixty-four is not a benchmark of agent intelligence. The tasks were different. It shows the gap between verifying one selected product and establishing whether the catalog as a whole can be trusted.

What the agents found

1. The same product had two current prices

An e-bike was advertised at €1,100 on the homepage and €1,400 on its product page. Both were public, current-looking surfaces. Neither explained which value should win.

This is not a formatting defect. It is a source-of-truth defect. Any assistant quoting a price has to choose one public statement and silently contradict the other.

2. Out-of-stock products were still promoted

Two products appeared in homepage merchandising blocks while their detail pages said they were sold out. A human sees a disappointing click. An automated shopper sees conflicting eligibility signals: one surface recommends the item while another says it cannot be bought.

3. Copies had become separate meanings

Two bicycle records shared the same internal product code and the same image. One represented an out-of-stock daily rental; the other represented a used bicycle for sale. The sale record still carried a slug ending in noleggio-al-giorno-copia — “daily rental copy.”

Copying a product is a convenient editorial shortcut. Without a cleanup discipline, it also copies identity, history and assumptions that no longer describe the new record.

4. Variants existed only in prose

Sizes, colors and rental durations appeared in descriptions or generic enquiry forms rather than as product variations. One page mentioned three colors in one section and two in another. Another described a fixed size while the enquiry form offered a full generic size list.

To a person, this is imperfect copy. To a shopping client, it is impossible to distinguish an available option from background information.

5. Brand was absent from every sampled record

All six sampled products lacked a structured brand. Brand-like words appeared in titles, but that is not the same thing. A prefix may be a manufacturer, a product family or part of a model name. Inferring it is a guess, and guesses become especially dangerous in multi-brand retail.

6. Categories described history as much as inventory

The menu exposed empty categories. Products overlapped across sale, rental, used and ex-rental branches. A rental item was not assigned to the rental category. The taxonomy still reflected old workflows, not one clean description of the current catalog.

7. Freshness was implied, not demonstrated

Five of the six sampled records originated in 2024. Several had not been modified since 2024 or 2025. That does not prove their prices or stock were wrong, but it means a current copyright year and a live Add to cart button are not evidence that every product fact was recently reviewed.

Why WooCommerce allows this

WooCommerce is flexible by design. It supports tiny handmade shops, rental businesses, wholesalers, fashion catalogs and complex plugin stacks. That flexibility is valuable, but it comes with few semantic constraints.

WooCommerce permitsA trustworthy catalog needs
A product with no structured brandA truthful, explicit brand or an explicit statement that the item is unbranded
Sizes and colors written anywhereSelectable variations with defined availability
Copied products with inherited slugs and codesStable, unique identities with a clear commercial role
Manual product grids and landing-page pricesEvery public price derived from the same source of truth
Empty, overlapping or historical categoriesA taxonomy that describes the inventory customers can act on now
Publication without a recent-review signalA maintenance process that revalidates price, stock and specifications

None of these states necessarily prevents checkout. That is the core issue: the platform validates publishability, not semantic coherence.

This is maintenance debt, not merchant stupidity

The disorder usually accumulates gradually. A merchant duplicates a product to save time. A theme adds a homepage price block. A rental plugin introduces a second representation of availability. Categories become navigation, campaign labels and operational notes at the same time. Staff members learn the unwritten rules, so the system appears to work.

Years later, no single action looks unreasonable. The combined catalog is still difficult to interpret because its meaning lives partly in fields, partly in prose and partly in institutional memory.

Agents do not create this debt. They make it measurable.

What disciplined WooCommerce data looks like

  1. One commercial source of truth. Homepage cards, category pages and product pages must derive price and availability from the same product record.
  2. Brands are declared, not inferred. Use one structured brand system consistently. Do not treat a category or the first word of a title as an implicit contract.
  3. Options are modeled as options. If size or color changes what can be purchased, represent it as a variation rather than prose.
  4. Copies receive new identities. Review SKU, internal code, slug, categories, images and purpose whenever a product is duplicated.
  5. Taxonomies describe the present. Remove empty branches, separate merchandising labels from product classification and audit overlapping assignments.
  6. Stock governs promotion. Do not feature an unavailable product unless the interface explicitly presents it as unavailable or back-orderable.
  7. Freshness is operational. Schedule periodic reviews of old products instead of treating “still published” as “still verified.”

What software can — and cannot — repair

Software can detect missing brands, contradictory public prices, empty categories, duplicated identifiers and unmodeled options. It can block malformed exports and show exactly which records require attention.

It should not decide that a word in a title is definitely a brand, choose which of two prices the merchant intended, or convert descriptive prose into purchasable variants without merchant confirmation. Those are commercial facts, not formatting problems.

The safest automation exposes uncertainty. The merchant resolves it at the source.

Honest limits

This was one production store, one model family and three task designs. The catalog contained only 18 public products, and six were audited in depth. The 64-attempt run also included deliberate verification requests and intermittent access-control failures, so it should not be read as the cost of every catalog audit.

The findings do not measure how common each defect is across WooCommerce. They demonstrate something narrower and directly observable: a store can remain usable to humans while its public product data contains contradictions that prevent deterministic interpretation.

The takeaway

Machine-readable is not the same as machine-trustworthy.

A discovery endpoint can get an agent through the door. An API can make retrieval efficient. Neither can make two prices agree, turn prose into real variants or decide what a copied product was supposed to become.

WooCommerce tells you whether a product can be displayed and sold. Catalog discipline determines whether anyone — human or machine — can trust what it says.

← All posts