Microsoft Web IQ: the grounding API that could reshape AI agents

by Luis Rijo
Luis Rijo
Luís Rijo is a seasoned marketing professional with over 10 years of experience in Digital Marketing, Search, Social, Display, Video, and DOOH. Based in Europe. Also writing in the spend. Reach out via luis@ppc.land
- LinkedIn
•
June 2, 2026
•
8 min read

Web IQ P95 latency chart: 164ms vs 406-2090ms for competing grounding API services. — Web IQ P95 latency chart: 164ms vs 406-2090ms for competing grounding API services.

Microsoft today launched Web IQ, a suite of AI-native grounding APIs described as a search engine purpose-built for AI agents rather than human users - a significant infrastructure shift that arrives as the industry grapples with what it actually means to connect language models to the live web.

What Web IQ is and why Microsoft built it

The announcement, published on June 2, 2026 on the Bing Search Blog by Knut Risvik, Distinguished Engineer for Search and AI at Microsoft, describes Web IQ as a system engineered from the ground up to meet the specific demands of agentic workloads. The core argument is straightforward: AI applications are only as reliable as the data they reason from, and existing API architectures were not designed for the pattern of repeated, low-latency retrieval that agentic workflows require.

According to Microsoft, "Web IQ is a search engine for AI systems. Where Bing was built to help people search the web, Web IQ is built to help AI agents find the right information, turn it into useful evidence, and use it inside reasoning." That distinction - between a tool for human navigation and a tool for machine inference - runs through every design decision described in the announcement.

The product builds on Bing's existing global index but required what Microsoft calls a "major ground-up re-architecture." Agents, the announcement explains, do not issue a single search and stop. They retrieve repeatedly, reason over evidence, adapt as new information arrives, and operate inside tight latency budgets. Meeting those requirements could not be solved by tuning individual components of the existing stack.

The technical architecture

From documents to passages

One of the most consequential design choices in Web IQ concerns the unit of information returned. Traditional search APIs return documents. Web IQ returns passages and what Microsoft calls "structured evidence objects." According to the announcement, "Models do not need documents, they need information and documents are often a poor proxy for that."

Operating at the passage level concentrates useful signal while eliminating irrelevant context, producing - in Microsoft's framing - a much higher ratio of information to tokens. The practical consequence is expressed as a principle: "fewer tokens in, better answers out, lower cost per call." Token efficiency matters because every token sent to a language model carries both a cost and a latency implication. At the scale of multi-step agentic workflows, those costs compound.

The embedding layer and DiskANN

At the model layer, Microsoft describes a deliberate choice to build a small number of deeply integrated models rather than a large collection of specialized ones. A central component is what the company calls its "best-in-class embedding model," which determines how information is projected into a vector space where semantic similarity becomes computationally tractable.

According to Microsoft, the quality of embeddings determines not just recall during retrieval but the shape of the candidate space that every downstream component operates on. The model is described as competitive at the top of public benchmarks, though its role inside Web IQ is framed as pragmatic rather than benchmark-driven: ensuring retrieval operates in "the right neighborhood of the information space."

The retrieval infrastructure builds on prior work at Microsoft Research, specifically a system called DiskANN. According to the announcement, DiskANN changed the practical limits of nearest neighbor search by making it possible to operate over large, disk-resident vector spaces without sacrificing latency - removing the need to trade recall for memory footprint. In Web IQ, this work is extended into what Microsoft describes as a broader retrieval fabric: retrieval executed across distributed partitions, routed globally, and tightly optimized to meet latency constraints.

Networking, data placement, and execution paths are all treated as part of the design space, because grounding is not a one-time operation inside agentic workflows. Small inefficiencies compound at scale.

The orchestration layer

At the top of the stack sits an orchestration layer where queries are interpreted, retrieval is fanned out, results are merged and filtered, and the output is transformed into evidence. According to Microsoft, this layer is not a conventional outer API layer. It is part of the execution loop of an AI agent. Latency at this layer is described as "not just user-visible but structurally significant," determining whether a system can afford multiple reasoning steps or must compress everything into a single attempt.

Performance claims and measurement

Microsoft provides three specific performance dimensions with accompanying metrics, though several carry methodological qualifiers worth noting.

On latency: Web IQ is described as operating at sub-165ms p95 latency. According to the announcement, this is nearly 2.5 times faster than the next best alternative from "the previous cohort of competitors under similar conditions." The measurement methodology is specified as averaging p95 numbers across five data center locations - West US2, North Central US, East US2, North Europe, and South Korea - using unique queries to avoid cache hits, with a configuration of 10 results at 10,000 characters per result. Competitor latencies listed in the accompanying chart range from 406ms to over 2,000ms at p95.

On quality: Microsoft introduces a metric called GDSAT - Grounding Satisfaction - which it describes as capturing whether grounding truly meets user intent across completeness, freshness, and authority. Unlike traditional relevance scores, GDSAT is presented as measuring downstream reasoning quality rather than document relevance. According to the announcement, Web IQ consistently achieves higher grounding satisfaction than alternative systems in comparable configurations, measured across a sample of 3,000 global queries drawn from production.

On token efficiency: A third dataset of 3,000 queries tested configurations spanning 10, 15, and 20 web results at character limits of 3,000, 5,000, 10,000, and 20,000 per result. The charts presented show Web IQ occupying a favorable position on the frontier of quality versus token count.

These are vendor-published comparisons. The specific competitors are labelled only as Competitor A through G, and the configurations used may not reflect real-world deployment conditions across the range of use cases developers would consider.

The Bing foundation and publisher commitments

Web IQ inherits the Bing global index, described in the announcement as "a continuously refined representation of the web, built over decades through a combination of infrastructure, partnership, and discipline." Microsoft argues that grounding systems cannot exceed the quality of the underlying corpus - if the index is incomplete, stale, or unreliable, no amount of modeling compensates.

The announcement also addresses the question of how Web IQ participates in the open web. According to Microsoft, the product inherits Bing's long-standing commitment to honoring robots exclusion protocols, publisher controls, and access preferences. The company states it is "actively engaging with the broader ecosystem through the IETF and other industry forums to help evolve interoperable standards for the AI era."

That commitment lands at a politically sensitive moment. The relationship between AI systems and the publishers whose content they consume has become one of the more contested questions in the search industry. Microsoft's explicit mention of IETF engagement signals awareness that the infrastructure decisions made in systems like Web IQ will eventually intersect with emerging standards for how content can be discovered, accessed, and used by AI.

Context: a long road to this architecture

The launch of Web IQ follows a sequence of infrastructure decisions that PPC Land has tracked over the past two years. In February 2025, Microsoft integrated Grounding with Bing Search into the Azure AI Agent Service, enabling real-time web data access for AI applications as an early step toward what Web IQ now formalises.

Three months later, in May 2025, Microsoft announced the retirement of the traditional Bing Search APIs effective August 11, 2025 - a decision that forced developers toward Grounding with Bing Search as the recommended alternative, at pricing that represented a 40 to 483 percent increase depending on tier. That transition compressed the developer ecosystem around Microsoft's grounding infrastructure rather than direct API access.

Brave moved quickly into the gap. In August 2025, Brave launched an AI Grounding service claiming 94.1 percent accuracy on standard tests, positioning itself as an independent alternative after Microsoft's API retirement.

On February 12, 2026, Microsoft published an announcement positioning grounding as the invisible infrastructurepowering nearly every major AI assistant in the market. That post, by Jordi Ribas, Corporate Vice President of Search and AI, was the first comprehensive public explanation of technology that already operated behind the scenes across Copilot and partner systems.

In May 2026, Microsoft published a detailed technical post explaining how indexing for AI grounding differs fundamentally from traditional web search - co-authored by Krishna Madhavan, Knut Risvik, and Meenaz Merchant. Web IQ, announced today, is the productised version of the infrastructure those posts were describing.

What this means for the advertising and marketing ecosystem

The commercial stakes are substantial. Microsoft's search advertising revenue reached $13.9 billion in fiscal year 2025, growing 21 percent year-over-year, and Microsoft's AI business hit a $37 billion annual run rate as recently as Q3 FY26. The grounding layer is the mechanism through which Copilot and partner AI products retrieve the information that underpins responses - and increasingly, the context in which advertising appears.

The passage-level evidence architecture has direct implications for how brands and publishers are surfaced in AI-generated responses. When Web IQ returns a passage rather than a document, it is making a selection decision. That selection is what determines whether a brand's content contributes to an AI-generated answer or is excluded from it entirely. The GDSAT metric Microsoft introduces - measuring completeness, freshness, and authority - defines the criteria by which that selection happens. Content that scores well on those dimensions will be more likely to surface; content that does not will effectively be invisible regardless of its traditional search ranking.

For developers building AI products, Web IQ represents a more direct interface to Microsoft's search infrastructure than anything previously available in this form. The announcement notes that it was built on the Bing foundation but re-architected specifically for agentic workloads - which means the latency, token efficiency, and quality tradeoffs it offers are calibrated for multi-step reasoning rather than single-query retrieval. That distinction matters practically for anyone building agents that need to retrieve information repeatedly inside a single inference cycle.

The IETF engagement mentioned in the announcement also deserves attention. Standards bodies move slowly, but the explicit mention of interoperable standards for the AI era suggests Microsoft sees Web IQ not just as a product but as a participant in a broader negotiation about how AI systems interact with the open web. How those standards develop will affect publishers, advertisers, and AI developers alike.

Timeline

January 31, 2025: Microsoft integrates Grounding with Bing Search into Azure AI Agent Service
May 12, 2025: Microsoft announces retirement of Bing Search APIs effective August 11, 2025
August 5, 2025: Brave launches AI Grounding service with 94.1% accuracy score on standard tests, positioning as independent alternative after Bing API shutdown
August 11, 2025: Microsoft shuts down Bing Search APIs; Grounding with Bing Search becomes recommended alternative
January 27, 2026: Yahoo launches Scout using Microsoft's Bing grounding API
February 1, 2026: Microsoft integrates Grounding with Bing Search into Azure AI Agent Service (expanded rollout)
February 10, 2026: Microsoft introduces AI Performance dashboard in Bing Webmaster Tools in public preview
February 12, 2026: Microsoft publishes announcement positioning grounding as foundational AI infrastructure
May 6, 2026: Microsoft publishes technical post on how AI indexing differs from traditional search
June 2, 2026: Microsoft launches Web IQ, a suite of AI-native grounding APIs, announced by Knut Risvik on the Bing Search Blog

Summary

Who: Microsoft, through Knut Risvik, Distinguished Engineer for Search and AI, announced the launch of Web IQ on the Bing Search Blog.

What: Web IQ is a suite of AI-native grounding APIs - described as a search engine for AI systems rather than for human users - that connects AI agents to live web data including web pages, news, images, and videos. It operates at sub-165ms p95 latency, returns passage-level evidence rather than full documents, and is built on a re-architected version of Bing's global index.

When: Announced on June 2, 2026 on the Microsoft Bing Search Blog.

Where: Web IQ is a cloud-hosted API product accessible to developers and enterprises building AI agents. Microsoft states that interest can be expressed through a link provided in the announcement.

Why: Microsoft argues that model capability alone no longer determines whether an AI system is useful - what matters is how effectively the full system connects models to fresh, real-world information. Web IQ addresses the specific retrieval demands of agentic workflows, where systems retrieve repeatedly, reason over evidence, and operate inside tight latency budgets that existing API architectures were not designed to meet.

Luis Rijo

Luís Rijo is a seasoned marketing professional with over 10 years of experience in Digital Marketing, Search, Social, Display, Video, and DOOH. Based in Europe. Also writing in the spend. Reach out via luis@ppc.land