Inside Google I/O 2026: the agentic AI shift no one saw coming

On May 22, 2026, Google published a 41-minute panel session titled "Defining the agentic AI era" as part of the Dialogues stage at Google I/O 2026. The video, hosted on the Google for Developers YouTube channel, features Logan Kilpatrick alongside Koray Kavukcuoglu - CTO of Google DeepMind and Google's Chief AI Architect - Liz Reid, who leads Search, Josh Woodward, who leads the Gemini App team, AI Studio, and Google Labs, and Jeff Dean, Google's Chief Scientist and co-lead of Gemini. The conversation was recorded on day two of I/O and covers the technical decisions behind Gemini 3.5 Flash, the architecture of Gemini Spark, the role of hardware in inference speed, and the future of human-machine interfaces.

The Dialogues stage, according to Google's summary published on May 22, 2026, brought together leaders, scientific minds, and creative visionaries across multiple sessions during the week. The AI Agents session, which this article covers, was one of six featured discussions, alongside separate conversations on quantum computing with Hartmut Neven and James Manyika, science with DeepMind CEO Demis Hassabis, robotics with Kanishka Rao and Boston Dynamics' Alberto Rodriguez, and cinematic creativity with director Doug Liman.

Gemini 3.5 Flash: what changed and why

Kavukcuoglu described Gemini 3.5 Flash as the opening instalment of the 3.5 series, with a deliberate focus on two things: coding capability and agentic workflows. "The biggest thing that we really wanted to focus on was getting coding and agentic workflows much, much better," he said. The 3.0 series, released approximately six months before I/O in September 2025, had already reached what Kavukcuoglu called "really high levels of reasoning and multi-modal understanding." The 3.5 series layered agentic capability on top of that foundation.

The improvement metric he cited was unambiguous: the team saw "a huge improvement from where we were six months ago." That gap was not simply the result of adding parameters. According to Kavukcuoglu, a significant driver was intensive internal use - "using the model very, very intensely for coding and non-coding tasks" - which surfaced real bottlenecks that were then fed back to researchers and model builders. The distinction between identifying a bottleneck and solving it is technically important. Kavukcuoglu framed long-horizon tasks as the specific hard challenge: "there are hard technical challenges in terms of going long and solving long horizon tasks."

PPC Land's coverage of the I/O 2026 developer keynote noted that Google used the conference to push its software-building tools toward a model where AI agents, not humans, do the orchestration - a framing consistent with what the Dialogues panel elaborated in greater depth.

Hardware as a prerequisite for agents

Jeff Dean addressed the hardware question directly. Google's TPU programme has now reached its eighth generation, and Dean described a recent separation of chip designs for training and inference. "We've got a bit of a separation of the designs so that we focus on, both training, which is a somewhat different problem, then inference," he said. The difference matters technically because inference - running a trained model against user queries in real time - imposes distinct latency requirements compared to the batch-processing workloads of training.

The practical consequence is speed. Dean linked hardware quality directly to user experience: "when you use a model and it's super fast, it's a much more delightful experience than when you're using it and it's slow. And that's true even if it's agents using it on your behalf." That last phrase is significant. In an agentic context, the user is not always present at the moment of execution. The agent may be running tasks in the background, and the model's inference speed therefore affects the total wall-clock time of a multi-step workflow rather than merely a single response.

Dean also identified a secondary bottleneck that faster hardware will eventually expose: the tools themselves. "If you make the model infinitely fast, Amdahl's law says if you're spending half your time in tools, you're not going to get anything better than 2x speedup," he said. Amdahl's law is a formula from parallel computing that describes the theoretical upper bound of speedup when only part of a system is accelerated. The implication is that as TPU performance improves, the limiting factor for agentic performance will shift from model inference to the tools agents invoke - file systems, APIs, databases - most of which were built for human-paced interaction. "Those tools themselves are going to need a lot of attention in terms of how do we make those as fast and robust as we can," Dean added.

Kavukcuoglu extended the argument. He described Google's internal engineering environment as heavily automated but still calibrated to human cadence. Agents operating at machine speed will require a re-engineering of that infrastructure. The outcome, he suggested, is a virtuous cycle: "tools get faster, and then the models work better. And then that is another cycle." Dean supplied a concrete example - internal Python tools were rewritten in Go using capable models, achieving speedups of 10 to 20 times in a single night of automated work.

Gemini Spark and the asynchronous turn

Josh Woodward described Gemini Spark, the always-on 24/7 background agent in the Gemini app, as specifically designed for asynchronous interaction - tasks that run without the user actively waiting for a result. Spark launched to trusted testers ahead of the panel; the full rollout for Ultra subscribers was confirmed for the week following the session.

Woodward outlined how his own use of Spark evolved. Initial use cases were scheduled tasks: "every morning, it might go through a bunch of stuff for me." The next phase introduced trigger-based automation - for instance, labelling a high-priority email and preparing a draft response when a specific sender writes in. He was careful to note that Spark does not send the email automatically. Reid echoed that boundary: "Very important, don't send the email."

Information retrieval was a third category. Woodward cited a personal example: daily updates on the Oklahoma City Thunder basketball team, delivered in the tone a "die-hard OKC Thunder basketball fan would talk to me." The framing matters because it illustrates how personalisation and tone adaptation are part of the product design, not simply a side effect.

Kavukcuoglu summarised the design philosophy: "Spark is really designed for this kind of asynchronous interaction, as opposed to when you go to the Gemini app chat or Search, you expect to have this conversation ongoing. But Spark is really for these kinds of asynchronous interactions where the agent is going to go and do something for you. And then you can come back and check, or it will notify you that I'm done."

PPC Land's analysis of the I/O Search announcements confirmed that Spark runs on Gemini 3.5 Flash and integrates with Google Workspace - Gmail, Docs, Calendar, and Slides - using the Antigravity infrastructure, and continues operating even when the device is closed. Initial MCP integrations include Canva, OpenTable, and Instacart, with US availability only at launch.

Search's latency model and the value of waiting

Reid introduced a framework for thinking about search latency that is more nuanced than simply "faster is better." Users' willingness to wait, she argued, depends on how difficult they perceive the underlying task to be and how much work the system is taking off their plate. "If the question in the user's mind is quick, then you better be lightning fast. Otherwise, they're like, why the heck am I waiting? On the other hand, if the user would have spent 15, 20 minutes doing it, you can have 10 seconds, for sure, if you can do something amazing."

The headline she cited for Search at I/O was framed internally as "the biggest upgrade to the search box in 25 years." For long-running agentic tasks - a weekend planner, for instance - Reid suggested that even a minute of wait time could be acceptable, given that the user can "set it up, go do something else, come back, and use it." That is a fundamentally different interaction model from the sub-second response expectations that have governed search for a generation.

Reid also described how the collaboration with DeepMind produced improvements in tool use. The partnership moved from feature requests - "could you please make this specific thing work for Search" - toward joint problem definition: identifying what is fundamentally challenging about tool use as a category and finding solutions that benefit all surfaces, not just Search. Kavukcuoglu described this as going to the root cause rather than fixing individual problems: "you can't fix individual problems. You always need to go to the root cause and try to solve those."

AI Mode queries now run on average three times as long as conventional search queries, and total volume has more than doubled every quarter since launch, according to data published on May 19, 2026, alongside the I/O announcements. The Gemini 3.5 Flash upgrade to AI Mode is the direct product of the model-and-Search collaboration Reid described.

The Antigravity SDK and the operating system moment

One of the more striking anecdotes in the session came from Kavukcuoglu, who described a demonstration shown by another speaker during I/O in which the Antigravity system was asked to build an operating system. "Hundreds of agents work for a day and a half. And they come back with something that is actually functional and working," he said. He noted that even after working on the system for months, he still found the result surprising.

Kavukcuoglu clarified what the term Antigravity covers technically: it includes the user interface for developers, the harness, and the SDK. "The same capability that Search is using, building on top of that SDK to build those agents into Search, we are enabling all the developers and businesses to use that exact same SDK that is really co-developed with Gemini to build applications for anyone who wants to use them," he said. That structural point is not trivial. The SDK available externally is the same one Google's own Search team uses internally - meaning third-party developers are working with production-grade infrastructure rather than a consumer-simplified subset.

Reid confirmed that Antigravity was initially adopted experimentally within Search, then expanded. "We started looking at, how do we bring some of the agentic coding capabilities into Search? Partway through, we said, oh, well, maybe for some of these more advanced use cases, let's look at using Antigravity. And then that worked so well. They're like, well, actually, maybe we should use it for more things." The phrasing reflects a bottom-up adoption pattern inside one of the world's largest engineering organisations - a data point about real-world utility rather than a product announcement.

Dean connected the Antigravity capability to a broader argument about bespoke software. In the past, he observed, software had to be standardised across many use cases because creation cost was high. Long-running agents that can generate bespoke software on demand change that calculus. "You can just say, I would like this thing to exist. It will go off and hopefully autonomously make that thing come into existence. And then you can use it."

Context engineering and the DESIGN.md shift

Woodward introduced a less-covered aspect of the agentic transition: not just tool speed, but context format. Teams at Google Labs have moved away from traditional product requirement documents, he noted. "We have entire teams now that haven't written PRDs in months because that's the old way of doing it. You're writing, in a way, whether it's an MD file or other things, where basically one of the models can just pick it up and go with it."

The concrete example he cited was Stitch, a Google Labs experiment for interface design. Stitch introduced a DESIGN.md file format, now open source, that codifies the design language for an application so that models can interpret and execute it directly. The shift is from documents written for human readers to specifications that are simultaneously human- and machine-readable. Woodward described this as "tool velocity" combined with context preparation - both factors are needed to make agentic workflows performant.

Google Flow, a filmmaking tool shown at I/O, illustrated a related product design shift. Woodward described a new area in the product where users can "vibe code" their own tools - effectively building bespoke software inside a creative application. "As a product team, we're kind of building almost a thin shell with the space to let people that use the tool really make the tool their own," he said. That approach inverts the traditional product design model, which assumes developers specify every feature before users receive it.

How the work itself is changing

All four panel members described concrete changes to how they personally work. Kavukcuoglu, who reviewed every DeepMind pull request personally in the organisation's early days and later stepped back from active coding as his career progressed, said he has returned to writing software asynchronously with agents. "I can do it very asynchronously. That is what is enabling me, because even if my time is split between many things, at night, I can come back to it and say, OK, what did the agents do." He described this as giving him "a more productive feeling."

Reid identified a different kind of change - the blurring of job boundaries. Tasks that previously required a specialist were blocked by startup costs: knowing which tool to use, where the data lives, how to navigate a large codebase. Agents address those startup costs. "Why do I need to do that? It's just because I don't actually know where the data is located. What's actually the case? OK. Well, now I can just go work with the agent." She had returned to writing code herself after years away from it. The implication she drew was about empowerment across roles: product managers changing design files directly, engineers freed to focus on performance rather than explanation.

Dean described diving into performance issues in unfamiliar parts of the codebase, expressing the solution at a high level, and delegating the implementation to agents with instruction to measure the performance impact and run all tests. Woodward noted the shift toward doing more work on mobile and by voice - inputs that suggest agents are enabling workflows that do not require a desktop environment or sustained attention.

Dean, asked for the most important developer skill in 2026, gave a direct answer: "Learn how to use coding tools and agents to be much more productive and build awesome things." Reid, asked what Search could take off people's plate in the next year, pointed to the drudge-work of information gathering: "What is the part of research and searching and gathering information that feels drudgery to you? And you feel like you can just get rid of that, give it off."

Why this matters for marketers and advertisers

The Dialogues session adds technical and organisational depth to announcements that otherwise appear in product-announcement form. For practitioners in digital advertising and marketing technology, several threads are directly relevant.

First, the Antigravity SDK is the same infrastructure powering Google's own agentic Search features. As PPC Land has documented since the I/O developer keynote, the framing at I/O 2026 has shifted from assistive tooling to agent orchestration - and the Dialogues panel confirms that the same infrastructure is being used internally at scale. Marketers building on these tools are working with production infrastructure, not experimental features.

Second, the latency model Reid described has direct implications for how performance is measured. Background agents change attribution patterns because they act without a single query trigger. The Dialogues discussion makes clear that this is deliberate - the product is being designed around asynchronous workflows, not as an extension of synchronous search.

Third, the tool-speed bottleneck Dean described will affect ad tech and marketing platforms specifically. Systems that handle campaign data, audience information, and budget decisions were built for human-paced interaction. As Google's own internal tools have been rewritten for machine-speed operation, third-party martech systems will face the same pressure. PPC Land reported separately on a Google engineer's warning at I/O 2026 that AI will multiply software output by an order of magnitude or more, and that systems built for today's volumes may not scale.

Fourth, the DESIGN.md concept and context engineering more broadly suggest that the skill of structuring information for machine consumption - not just human readers - is becoming a practical competency for anyone managing creative or campaign assets at scale.

Timeline

September 2025 - Google releases Gemini 3.0, featuring high levels of reasoning and multi-modal understanding, establishing the foundation for the 3.5 series
November 2025 - Google launches Gemini 3 with generative UI for dynamic search experiences, introducing generative UI as a capability
February 10, 2026 - Chrome's WebMCP Early Preview Program begins, ahead of the origin trial announced at I/O
May 18, 2026 - Companion documentation for WebMCP published on Chrome for Developers
May 19, 2026 - Google I/O 2026 opens in Mountain View; Gemini 3.5 series announced, Antigravity 2.0 and WebMCP origin trial in Chrome 149 revealed during the Developer keynote; AI Mode upgraded to Gemini 3.5 Flash and search box redesigned; AI Mode surpasses one billion monthly active users globally
May 20, 2026 - Google Marketing Live 2026 takes place; Ask Advisor launched as unified Gemini AI agent for Google Ads, Analytics, Merchant Center, and DV360
May 21, 2026 - Google GML EMEA broadcast covers the same AI and commerce announcements for the EMEA region
May 22, 2026 - Google publishes the "Defining the agentic AI era" Dialogues panel video on YouTube; the Google Keyword blog publishes the Dialogues stage highlights recap; Gemini Spark confirmed for full rollout to Ultra subscribers the following week

Summary

Who: Logan Kilpatrick (Google), Koray Kavukcuoglu (CTO, Google DeepMind / Chief AI Architect), Liz Reid (VP, Google Search), Josh Woodward (VP, Gemini App, AI Studio, Google Labs), and Jeff Dean (Chief Scientist, Google / co-lead of Gemini).

What: A 41-minute panel session at the Google I/O 2026 Dialogues stage covering the technical foundations of Gemini 3.5 Flash, the eighth-generation TPU design split between training and inference chips, the Gemini Spark background agent and its asynchronous interaction model, the Antigravity SDK and its internal use within Google Search, the DESIGN.md open-source context format, and how each speaker's own working practices have shifted as agentic tooling has matured.

When: The panel was recorded on day two of Google I/O 2026 and published on May 22, 2026. The main I/O announcements it references were made on May 19, 2026. Gemini Spark was confirmed for Ultra subscriber rollout the week following the session.

Where: Google I/O 2026, held at Google's campus in Mountain View, California. The Dialogues stage is a separate track from the main keynote. The session video was published on the Google for Developers YouTube channel.

Why: The session matters because it translates product announcements into the technical and organisational logic behind them. The confirmation that Antigravity's SDK is identical for internal and external developers, the Amdahl's law framing of tool-speed as the next agent bottleneck, and the latency model underpinning Search's agentic design decisions all carry practical implications for developers, advertisers, and marketers building on or competing with Google's AI infrastructure in 2026.