The pricing team had flagged an anomaly in their competitor tracking dashboard—a laptop model where the agent-collected data showed stable availability but prices had jumped 12% overnight. She pulled the historical records to see what they'd missed.
Last quarter's entry, manually collected: "Limited stock—2-3 units available." This quarter's entry for the same listing, agent-collected: "In stock." The date: two weeks before the price increase.
Three months of data showed the pattern holding. "Limited stock" preceded price increases in seven out of nine cases. "Call for quote" appeared before volume deals. "2-3 weeks" marked supply chain constraints that correlated with competitor repositioning. The manual collectors had preserved these signals by capturing everything, uncertain what would matter.
The agent had normalized everything to binary availability. In stock or out of stock. "Limited stock" became "in stock" because items were technically available. "2-3 weeks" became "out of stock" because they weren't immediately shippable. "Call for quote" became a price—sometimes MSRP, sometimes an average pulled from the product page, always something plausible. The LLM filled gaps where information was ambiguous.
The model comparison: Last quarter, 73% accuracy predicting competitor price changes. This quarter, 68%. Five percentage points translated to roughly $2M in suboptimal pricing decisions over three months.
The manual collection logs had been a disaster—the reason they'd automated in the first place. Inconsistent date formats. Product names that varied wildly. Availability marked as "Limited stock" or "Ships in 2-3 weeks" or just blank. The pricing model had learned to read that mess. The pricing model treated "Call for quote" as a signal about negotiation windows. The inconsistent dates reflected how different regional teams updated their systems, which mattered for timing competitive moves.
A ticket went in requesting specification changes: preserve ambiguous availability signals, flag missing prices rather than inferring them, maintain source date formats. The engineering response came back within an hour. Downstream teams couldn't handle inconsistent formats. The normalization layer had taken six months to build. Making data messier would break existing dashboards and require extensive validation logic.
The "Limited stock" examples again, what the agent had converted them to, then what happened to competitor pricing in the weeks that followed. Market behavior still followed the pattern, though the data no longer captured it.
The sample validation protocol took shape—the work that would now happen between agent collection and model training. Someone would need to review ambiguous signals, decide which variance mattered, flag cases where normalization had removed something. The automation worked as designed. The job got more complex.
Things to follow up on...
-
Sampling bias in automated collection: High-quality items disappear faster from online markets because they attract more requests, meaning web scrapers systematically miss the best offers unless adjusted for content volatility.
-
Hybrid model performance patterns: Organizations using AI with human oversight consistently achieve accuracy rates exceeding 99%, while pure automation shows 5-15% error rates in applications where data accuracy directly impacts business outcomes.
-
Financial data normalization infrastructure: Fintool maintains fiscal calendars for 10,000+ companies because "Q1 2024" is ambiguous without company context—for Apple, Q1 2024 refers to October-December 2023, not January-March.
-
Reasoning hallucinations in LLMs: When models explain trends or infer causes without strong grounding, they generate narratives that sound logical but aren't supported by evidence—the chain of reasoning looks coherent but doesn't follow from the facts.

