The API Exists. The Data Doesn't.

LinkedIn has a public API. Well-documented, officially supported, designed for programmatic access. If you need LinkedIn data, the conventional answer is: use the API.

Try it. Name, headline, current position—the basics come through. Work history? Requires special approval. Skills, recommendations, education details that make LinkedIn valuable for recruiting intelligence? Partner API access only, restricted to trusted partners.

Meanwhile, open your browser. Visit any public profile. Everything's right there. Every detail you need, publicly visible, one click away.

This is the operational paradox we encounter constantly building enterprise web agent infrastructure: companies scrape websites that offer APIs. Not because they're unaware. Because the API doesn't provide what they need.

When 'API Available' Hides the Real Problem

The API's existence creates the appearance of programmatic access being handled—making the actual operational challenge harder to explain and justify.

Watch how this unfolds in organizations. Technical teams see "API available" and assume the data access problem is solved. Business teams discover the API provides only a subset of what's visible on the page. Everyone talks past each other because the API obscures rather than reveals the operational complexity.

APIs make their limitations invisible until you're deep into implementation. You discover the constraints when you're already committed to the approach, when changing direction means admitting the initial path was wrong.

The Data You Needed Six Months Ago

You're analyzing competitor pricing strategy. Today's data is straightforward—API call, done.

But the question you need to answer is: "How has their strategy evolved over six months?"

You can't API your way backward. Historical data requires continuous collection. If you didn't start capturing it six months ago, it's gone. The API shows you the present. Your operational need is the trend.

The Temporal Bind

You must decide to collect data before you know you'll need it. The analyst who realizes in November they should have been tracking competitor behavior since May has no recourse.

Building infrastructure for continuous collection at scale reveals complexity that occasional API calls never expose. Rate limits that seem reasonable for spot checks—Twitter's 15 requests per 15 minutes, for instance—become operational bottlenecks when you're maintaining historical datasets across thousands of entities. The infrastructure overhead of managing API quotas, handling throttling, and coordinating collection schedules across multiple platforms compounds quickly.

APIs focus on current state—real-time transactions, active inventory. Historical endpoints exist for some platforms, but often with shorter retention windows or premium pricing that makes continuous collection economically unviable at enterprise scale.

Building for Operational Reality

Enterprises face a practical question: what does reliable data access require when the official channel doesn't serve your needs?

They adopt hybrid approaches: APIs where they provide necessary data at acceptable cost and scale. Browser-based collection where APIs exclude critical information, impose unworkable limits, or simply don't exist for the long tail of sites that matter to your operations.

Building infrastructure that handles both—with observability, governance, and production reliability—means accepting that "use the API" sometimes leads nowhere useful. The API exists, well-documented and officially supported. It just doesn't bridge the gap between what platforms expose and what enterprises need to operate.

Start with operational reality, not the ideal solution.

Things to follow up on...

Wikipedia's API plea: The Wikimedia Foundation recently urged AI companies to use its paid API instead of scraping after detecting bots trying to appear human while severely taxing their servers.
Twitter's enterprise pricing shock: When Twitter shut off free API access in 2023, expected enterprise rates reached $40,000+ per month, forcing many third-party developers to shut down their applications entirely.
The hiQ Labs precedent: LinkedIn's legal battle with hiQ Labs over scraping public profile data for "people analytics" products demonstrates how companies navigate the gap between API restrictions and business needs when official channels don't provide necessary data.
Social media's competitor blindspot: Instagram's API only provides Stories insights to account owners, and Facebook shares audience demographics solely for connected accounts, making competitor analysis impossible through official channels.