Google's OKF wants to be the lingua franca for AI agent knowledge

by Luis Rijo
Luis Rijo
Luís Rijo is a seasoned marketing professional with over 10 years of experience in Digital Marketing, Search, Social, Display, Video, and DOOH. Based in Europe. Also writing in the spend. Reach out via luis@ppc.land
- LinkedIn
•
June 13, 2026
•
11 min read

Glowing document network inside a digital cathedral representing Google's Open Knowledge Format

Google yesterday introduced the Open Knowledge Format (OKF), a vendor-neutral, open specification built on plain markdown files and YAML frontmatter that formalizes how AI agents store, share, and consume organizational knowledge. Published on June 12, 2026, the format is now available on GitHub alongside three sample bundles and two reference implementations.

The announcement comes from Sam McVeety, Tech Lead for Data Analytics at Google Cloud, and Amir Hormati, Tech Lead for BigQuery at Google Cloud, both writing on the Google Cloud Blog. According to Google, OKF v0.1 addresses a specific and persistent failure point in how organizations build agentic systems: the lack of any agreed-upon format for the knowledge those agents need to function.

The fragmented knowledge problem

Ask any engineering team how their AI agents find answers to internal questions - how a particular metric is calculated, which database table describes a given concept, or what the correct join path is between two systems - and the answer tends to be the same. The information exists, scattered across wikis, code comments, metadata catalog APIs, shared drives, and the heads of a few senior engineers, but no single format connects them.

According to Google, this fragmentation means that every agent builder is "solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it." The direct consequence for teams building agentic systems is that the agents themselves cannot reliably answer questions about internal structure without first retrieving context from a patchwork of incompatible sources.

OKF is positioned as a format-level answer to that problem, not a new service, platform, or runtime. According to the specification, the format is "intentionally minimal: a directory of markdown files with YAML frontmatter. There is no schema registry, no central authority, and no required tooling. If you can cat a file, you can read OKF; if you can git clone a repo, you can ship it."

What OKF v0.1 actually specifies

The technical structure is deliberately simple. An OKF bundle is a directory tree of markdown files. Each file represents a concept - a single unit of knowledge that may describe a tangible asset such as a database table or API endpoint, or an abstract idea such as a business metric or an incident playbook. The concept's file path within the bundle serves as its unique identifier: a file at tables/orders.md carries the concept ID tables/orders.

Every concept file has two parts. The first is a YAML frontmatter block, delimited by --- at the top of the file and a closing --- on its own line. The second is a standard markdown body containing free-form content. According to the specification, only one frontmatter field is strictly required: type, a short string identifying the kind of concept, with example values including BigQuery Table, API Endpoint, Metric, and Playbook.

Additional recommended fields - listed in priority order in the spec - are title, description, resource, tags, and timestamp. The resource field carries a URI that uniquely identifies the underlying asset, while timestamp uses ISO 8601 format for the datetime of last meaningful change. Producers may include any additional keys beyond these; consumers must tolerate unknown keys without rejecting the document.

Two reserved filenames carry defined meaning at any level of the directory hierarchy. An index.md file, when present, enumerates the directory's contents to support progressive disclosure, letting a human or agent see what is available before opening individual documents. A log.md file, when present, records the history of changes to that scope in a flat list of date-grouped entries, newest first, using ISO 8601 date headings in YYYY-MM-DD form.

Concepts link to each other using standard markdown links. According to the specification, a link from concept A to concept B "asserts a relationship. The specific kind of relationship (parent/child, references, joins-with, depends-on, etc.) is conveyed by the surrounding prose, not by the link itself." This design means consumers building a graph view treat all links as directed edges of an untyped relationship. Broken links - links whose target does not exist in the bundle - are explicitly permitted; they may represent knowledge not yet written.

Conformance criteria

According to the specification, a bundle is conformant with OKF v0.1 if three conditions are met: every non-reserved markdown file in the tree contains a parseable YAML frontmatter block; every frontmatter block contains a non-empty type field; and every reserved filename, where present, follows the structure described in the spec.

Consumers must not reject a bundle because of missing optional frontmatter fields, unknown type values, unknown additional keys, broken cross-links, or missing index files. According to the specification, "this permissive consumption model is intentional: OKF is meant to remain useful as bundles grow, get refactored, and are partially generated by agents."

The full v0.1 specification, including conformance criteria, cross-linking rules, and the set of reserved filenames, fits in a single page spanning 451 lines and 14.7 kilobytes on GitHub. Version numbering follows a <major>.<minor> scheme: a minor version bump introduces backward-compatible additions such as new optional fields or new conventional section headings; a major version bump may introduce breaking changes such as renaming required fields.

The LLM wiki pattern behind the design

The intellectual lineage for OKF draws explicitly on what Google describes as the "LLM wiki" pattern - teams maintaining a shared markdown library that agents can read and update over time, rather than retrieving the same facts from scattered documents on every query.

According to Google, Andrej Karpathy, the AI researcher and educator, articulates the core idea in a GitHub gist. As quoted by Google in the announcement, Karpathy writes: "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." The bookkeeping that causes humans to abandon personal wikis is, according to Google's framing, exactly the kind of work language models handle well.

Several naming conventions have emerged from this pattern independently: the AGENTS.md and CLAUDE.md family of convention files that developers drop into repositories; Obsidian vaults wired to coding agents; repos full of index.md and log.md artifacts that agents consult before doing real work; and "metadata as code" approaches that store catalog metadata alongside source code. Each instance is structurally similar but bespoke, and according to Google, none of them is "intentionally designed to cooperate."

OKF differs from these patterns primarily in being formally specified - pinning down the small set of rules needed for interoperability without dictating tooling or content models.

Three design principles

Google describes three explicit principles behind the format's design. The first is being minimally opinionated: the spec requires exactly one field (type) of every concept, leaving everything else - what types exist, what additional fields to include, what sections the body has - to the producer. According to Google, the specification "defines the interoperability surface, not the content model."

The second principle is producer/consumer independence. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one language model can be queried by another. The format is described as the contract; the tooling at each end is independently swappable.

The third principle is that OKF is a format, not a platform. According to Google, the format "is not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK to read, write, or serve."

Reference implementations and sample bundles

Google is shipping two reference implementations alongside the specification. The first is an enrichment agent that walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second language model pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths. The second is a static HTML visualizer that turns any OKF bundle into an interactive graph view in a single self-contained file, with no backend, no install on the viewing side, and no data leaving the page.

Three ready-to-browse sample bundles are available in the repository: one for GA4 e-commerce data, one for the Stack Overflow public dataset, and one for the Bitcoin public dataset. All three were produced by the reference enrichment agent and committed to the repository as living examples of conformant OKF.

According to Google, these are "proofs of concept, deliberately. The agent demonstrates one way to produce OKF; nothing about the format requires a specific agent framework or LLM. The visualizer demonstrates one way to consume it; nothing about the format requires HTML or a graph view."

Alongside the GitHub release, Google updated its Cloud Knowledge Catalog to be able to ingest Open Knowledge Format and serve it to agents.

What the spec explicitly does not do

The specification draws clear lines around what it does not attempt to standardize. According to the SPEC.md document on GitHub, OKF is not designed to define a fixed taxonomy of concept types, prescribe storage or serving infrastructure, or replace domain-specific schemas such as Avro, Protobuf, or OpenAPI. OKF references those schemas; it does not subsume them.

This restraint is notable. Other data catalog and metadata management initiatives have historically tried to standardize more: vocabulary taxonomies, governance workflows, lineage formats, and access control models. OKF avoids all of that, treating those concerns as the producer's problem while standardizing only the file-level structure that makes bundles interoperable.

Industry context: agentic AI and the context problem

The OKF announcement lands at a moment when the advertising and marketing technology industry is actively grappling with how AI agents access internal knowledge. PPC Land has tracked the rapid build-out of agentic infrastructure across the advertising technology sector since at least late 2025, when the Ad Context Protocol launched and the industry began debating how agents should discover inventory, execute campaigns, and communicate across platforms without bespoke integrations.

The underlying problem OKF addresses - fragmented context that agents must assemble from scattered, incompatible surfaces - maps directly onto challenges the agentic advertising standards community has encountered. IAB Tech Lab's AAMP initiative, formalized on February 26, 2026, includes interoperability standards designed partly to address how agents share and consume information across organizational boundaries. OKF operates at a lower and more generic layer, not specifically in the advertising domain, but the same structural challenge applies.

Meta, which published its own approach to agentic data warehouse access in August 2025, used a folder-structure-based approach to represent hierarchical data warehouse knowledge as text compatible with language models. According to Meta's engineers at the time, "LLMs communicate through text so the hierarchical structure of data warehouse nicely mapped to a folder structure." OKF formalizes that intuition into a versioned, open specification.

Google Cloud's own agentic AI trajectory has accelerated significantly since mid-2025, with the company having published a 54-page agentic AI framework document in November 2025, launched the Google Analytics MCP server in July 2025 for natural language queries against analytics data, and introduced Ask Advisor at Google Marketing Live in May 2026. OKF sits upstream of all those surface-layer tools: it addresses what the agents know before they act, not how they act.

Why the format-first bet matters

The choice to publish OKF as a format rather than a service has significant practical consequences. A format can be adopted without a commercial relationship, without a pricing model, and without migration risk. It can live in a git repository alongside code. It can be read by humans in any text editor and parsed by any agent that can read files. It can be diffed, versioned, and reviewed through standard developer tooling.

According to Google, the analogy is to what Karpathy calls the "lingua franca" that enables knowledge produced by one team to be consumed by another - whether that knowledge was written by a human, generated by an enrichment pipeline, or synthesized by a language model. The format is designed to be the medium of exchange, neutral to the parties on either side.

That is also the risk. Formats succeed when enough parties speak them. The specification is currently at v0.1, described explicitly as "a starting point, not a finished standard." It is published on GitHub under Google's GoogleCloudPlatformaccount, which provides credibility but also raises the question of how governance will evolve as external contributors propose changes.

The OKF repository shows two pull requests, 12 issues, and five top-level directories at the time of publication: agents, okf, bundles, samples, and src. The spec itself is at okf/SPEC.md. The enrichment agent code is in the agents directory. The format is versioned: the initial commit from Amir Hormati - the BigQuery Tech Lead credited as co-author - carries the commit hash ee67a5c with the message "Import Open Knowledge Format reference enrichment agent."

Practical implications for data and marketing teams

For organizations running AI-powered analytics workflows - a description that increasingly applies to large marketing and media buying operations - OKF raises a concrete question: where is the organizational knowledge that agents need, and in what form does it exist?

A media agency running programmatic campaigns at scale may have knowledge distributed across a data warehouse schema, a campaign taxonomy documented in a shared spreadsheet, runbooks maintained in Confluence, and metric definitions embedded in dashboard configuration files. None of these surfaces is natively readable by an AI agent in a consistent way. OKF provides a path to consolidate them: export or author concept documents for each knowledge unit, organize them into a bundle, link them, and then point agents at the bundle rather than at the original scattered surfaces.

The reference enrichment agent demonstrates one automation path for teams with BigQuery data: run the agent against a dataset, get a first-pass OKF bundle with schemas, join paths, and citations, then refine manually. According to Google, "the format is the contribution. The tools we've shipped exist to make it real, and to lower the cost of trying it out."

Timeline

May 28, 2026 - OKF concept documents carry a sample timestamp of 2026-05-28T14:30:00Z, indicating the reference enrichment agent was run against the sample bundles on that date.
June 11, 2026 - Amir Hormati commits the reference enrichment agent to the GoogleCloudPlatform/knowledge-catalogrepository with commit hash ee67a5c (listed as "yesterday" relative to the June 12 publication date).
June 12, 2026 - Google today publishes the Open Knowledge Format (OKF) v0.1 specification on GitHub, alongside the blog post "Introducing the Open Knowledge Format" authored by Sam McVeety and Amir Hormati on the Google Cloud Blog. The format is published as open and vendor-neutral; Google simultaneously updates Cloud Knowledge Catalog to ingest OKF bundles.

Related PPC Land coverage:

Ad Context Protocol divides advertising industry on agentic AI standards - November 2, 2025
Industry expert questions AdCP media buying protocol for ad automation - November 3, 2025
Google Cloud releases comprehensive agentic AI framework guideline - November 10, 2025
IAB Tech Lab CEO warns industry chasing 'shiny pennies' in agentic AI - December 31, 2025
IAB Tech Lab names its agentic ad initiative AAMP to end market confusion - March 1, 2026
Inside Google I/O 2026: the agentic AI shift no one saw coming - May 2026
IAB Tech Lab summit confronts an agentic web that already arrived - June 2026

Summary

Who: Sam McVeety (Tech Lead, Data Analytics, Google Cloud) and Amir Hormati (Tech Lead, BigQuery, Google Cloud) authored the announcement; the format is published by the Google Cloud Data Cloud team under the GoogleCloudPlatform GitHub organization.

What: The Open Knowledge Format (OKF) v0.1 is an open, vendor-neutral specification for representing organizational knowledge as a directory of markdown files with YAML frontmatter. It formalizes the "LLM wiki" pattern into an interoperable format. The only required field per concept document is type. Two reference implementations ship with the spec: a BigQuery enrichment agent and a static HTML visualizer. Three sample bundles for GA4 e-commerce, Stack Overflow, and Bitcoin public datasets are included.

When: Published today, June 12, 2026, on the Google Cloud Blog. The commit introducing the reference enrichment agent was made the day before, on June 11, 2026. The sample bundles carry timestamps from May 28, 2026.

Where: The specification, reference implementations, and sample bundles are available in the GoogleCloudPlatform/knowledge-catalog GitHub repository at okf/SPEC.md. Google's Cloud Knowledge Catalog has also been updated to ingest OKF bundles natively.

Why: Organizations building agentic AI systems face a fragmented context landscape in which the internal knowledge that agents need - table schemas, metric definitions, join paths, runbooks, deprecation notices - is scattered across metadata catalogs with proprietary APIs, wikis, shared drives, and code comments. Every agent builder currently solves this context-assembly problem independently. OKF attempts to establish a common format that any producer can write and any consumer can read, without requiring a proprietary runtime, SDK, or commercial relationship.

Luis Rijo

Luís Rijo is a seasoned marketing professional with over 10 years of experience in Digital Marketing, Search, Social, Display, Video, and DOOH. Based in Europe. Also writing in the spend. Reach out via luis@ppc.land