IAB Tech Lab this year received a donation of an open-source, AI-powered taxonomy mapping tool from Mixpeek designed to accelerate migration from Content Taxonomy 2.x to the updated 3.1 version, according to Katie Shell, Associate Product Manager at IAB Tech Lab. The tool reduces what previously required weeks or months of manual work into seconds through AI-powered methods including TF-IDF, BM25, KNN, and LLM re-ranking.
The donation arrives as IAB Tech Lab continues expanding its taxonomy infrastructure for digital advertising. Content Taxonomy 3.1 expands coverage from approximately 400 categories in version 2.x to more than 1,500 categories, introducing more granular classification options for publishers, supply-side platforms, demand-side platforms, and brand safety vendors operating across programmatic advertising ecosystems.
Semantic alignment replaces manual spreadsheet maintenance
The IAB Tech Lab Taxonomy Mapper treats taxonomies as shared infrastructure for semantic alignment rather than static spreadsheets maintained by hand. According to Ethan Steininger, Founder and CEO of Mixpeek, "In advertising, agentic AI only works if everyone is speaking the same language. Taxonomies are that contract, they define what systems can retrieve, optimize against, and act on without drifting. The mapper is about making that shared structure programmable instead of manual."
The technical implementation uses multiple AI-powered methods to translate labels. The system applies TF-IDF (term frequency-inverse document frequency) analysis to identify keywords distinguishing categories. BM25 probabilistic ranking functions evaluate how well documents match query terms. K-nearest neighbors algorithms find the closest matching categories in the new taxonomy. Large language model re-ranking provides semantic understanding beyond keyword matching, enabling the system to distinguish between contextually similar terms like "Auto Parts" and "Gasoline Prices" rather than treating them as interchangeable text.
The mapping tool runs locally, transforming taxonomy alignment into what IAB Tech Lab characterizes as public infrastructure rather than proprietary advantage. Any SSP, DSP, publisher, or brand safety vendor can bring their IAB Content Taxonomy 2.x labels or internal taxonomies up to version 3.1, gaining deterministic, confidence-scored mappings without requiring external API access or cloud processing.
Taxonomy infrastructure for AI systems
Taxonomies provide guardrails by defining vocabulary agents can reason over, retrieve against, and act upon, according to the IAB Tech Lab announcement. A more granular taxonomy enables AI systems to understand content at more precise, contextual levels, directly impacting campaign targeting accuracy and measurement capabilities through consistent labeling.
The announcement positions taxonomies as essential infrastructure for agentic AI systems increasingly deployed across advertising operations. As systems move from generating outputs to taking actions, they depend on shared guardrails to maintain reliability and interoperability. Without taxonomies keeping AI systems on track, Shell notes, a system asked to "categorize a cooking video" might tag content as "a documentary about fettuccine's tumultuous journey through a heat management crisis."
Many existing AI workflows depend on structured signals to determine what content enters working sets initially and how that content can be reasoned over or explained downstream. Taxonomies provide this structure by defining what content a system is allowed to use or respond with. In advertising, this means models can distinguish between "Auto Parts" and "Gasoline Prices" rather than treating them as similar text strings.
Privacy-focused taxonomies serve as guardrails for how AI systems use data initially. IAB Tech Lab's Privacy Taxonomygives models common language for understanding what data is sensitive, what uses are restricted, and what should be off-limits, particularly as agents gain autonomy in decision-making. These human-defined boundaries help ensure automation doesn't drift into uses conflicting with privacy laws or consumer expectations.
Technical implementation and workflow
The mapping process operates through straightforward workflows accessible through local host interfaces. Users upload CSV or JSON files containing Content Taxonomy 2.x categories or proprietary taxonomy structures. The tool maps those categories to Content Taxonomy 3.1 with confidence scores indicating match quality. Results can be exported in multiple formats for integration into classification pipelines.
The system accelerates migration from 2.x to 3.1 significantly while requiring substantially less internal team work, according to Shell. When everyone speaks the same language programmatically, creative optimization, reporting, and safety models can interoperate instead of working in silos.
The technical architecture enables efficient processing across large category sets. The deterministic matching phase handles exact label matches and known aliases. The fuzzy matching layer processes approximate string matching for categories with minor variations. The confidence scoring system provides numerical indicators of match quality, enabling review workflows where human validation focuses on lower-confidence mappings requiring judgment.
Content Taxonomy 3.1 enhancements
Content Taxonomy 3.1 introduces clearer separation between primary topic classifications and orthogonal vectors describing content characteristics. The updated structure provides better support for connected television, video content, podcasts, games, and app stores compared to previous versions.
The enhancement addresses limitations in version 2.x related to content format classification. News content versus opinion pieces now receive distinct treatment. Entertainment genres gain more granular categorization options. Video and audio content types receive specialized classification paths reflecting their unique characteristics within advertising contexts.
The taxonomy updates support contextual targeting capabilities gaining adoption as advertisers navigate privacy restrictions and third-party cookie deprecation. The approach analyzes content characteristics to place advertisements alongside relevant material without relying on individual user tracking, addressing regulatory frameworks restricting data collection across digital advertising environments.
The non-backwards compatibility in areas including News/Opinion classifications and entertainment genres requires careful migration planning. The IAB Tech Lab Taxonomy Mapper addresses this challenge by providing tools to curate edge cases with overrides, synonyms, thresholds, and audit outputs.
OpenRTB integration and programmatic compatibility
The mapped categories emit valid IAB 3.0 identifiers suitable for OpenRTB and VAST integration. The system generates content.cat arrays with configurable cattax parameters enabling programmatic platforms to process category information through standardized protocols.
OpenRTB facilitates real-time bidding transactions processing billions of auctions daily across programmatic advertising infrastructure. The IAB 3.0 content identifiers enable demand-side platforms and supply-side platforms to execute contextual targeting strategies without relying on personal data collection that increasingly faces regulatory restrictions.
The technical specifications support multi-category output per input, enabling content to receive multiple relevant classifications rather than forcing single-category assignments. This flexibility better reflects content reality where articles, videos, or applications often span multiple topics or themes.
Vector attributes support extends beyond primary content classification to include Channel, Type, Format, Language, Source, and Environment designations. These orthogonal classifications enable more sophisticated targeting strategies that consider both content topics and presentation contexts simultaneously.
Industry standardization context
IAB Tech Lab's broader taxonomy work includes Content Taxonomy 3.0 offering more granular categorization than predecessors, while Audience Taxonomy 1.1 captures demographics, purchase intent, and personal interests. Both taxonomies have been developed collaboratively by the ecosystem and remain open and transparent, according to Anthony Katsur, Chief Executive Officer at IAB Tech Lab.
The standardization efforts complement other IAB Tech Lab initiatives addressing programmatic advertising infrastructure changes. The organization previously released Publisher Advertiser Identity Reconciliation protocol in September 2024 for first-party data matching, Attribution Data Matching Protocol in October 2024 for privacy-preserving conversion measurement, and ID-Less Solutions Guidance in July 2025 for advertising without traditional identifiers.
The mapper donation aligns with IAB Tech Lab's position as coordinator rather than competitor in the agentic landscape, extending rather than replacing existing platform capabilities through open-source contributions. This approach contrasts with proprietary protocol initiatives that position particular vendors as gatekeepers to functionality.
Content classification accuracy affects multiple use cases across digital advertising. Contextual targeting systems require precise category assignments to deliver relevant advertisements alongside appropriate content. Brand safety implementations depend on taxonomy classifications to exclude inappropriate inventory from campaigns. Measurement systems use category data to aggregate performance metrics across content types.
Implementation resources and access
The IAB Mapper helps advertising technology teams migrate from IAB Content Taxonomy 2.x to 3.0 through input of existing 2.x codes or labels in CSV or JSON format. The deterministic matching phase identifies exact matches, followed by fuzzy matching for approximate string similarities, with optional semantic enhancement for complex cases. Output provides valid IAB 3.0 identifiers ready for OpenRTB and VAST integration.
The repository contains client libraries in Python and JavaScript implementations. The Python package available through PyPI includes CLI tool functionality, Python API for programmatic use, optional embeddings support using Sentence-Transformers, optional LLM re-ranking through Ollama, and web demo with FastAPI. The JavaScript package available through npm provides full TypeScript support with type definitions, CommonJS and ES Modules compatibility, zero dependencies for core functionality, and lightweight fuzzy matching.
Both implementations provide deterministic alias and exact matching capabilities followed by fuzzy string matching. IAB 3.0 identifier emission includes configurable cattax parameters for OpenRTB integration. Multi-category output per input enables comprehensive classification. Vector attributes support covers Channel, Type, Format, Language, Source, and Environment designations. Sensitive Content flags receive visibility with optional exclusion capabilities. OpenRTB and VAST CONTENTCAT helpers facilitate integration. Local-only processing ensures reproducible, versioned catalog access. Override support accommodates manual mapping adjustments.
Python-specific features include local embeddings through Sentence-Transformers for semantic matching, optional LLM re-ranking via Ollama, CLI tool with CSV and JSON input/output capabilities, and web UI demo functionality.
The normalization process applies alias and exact matches via synonyms initially. Fuzzy retrieval using rapidfuzz, TF-IDF, or BM25 follows with configurable thresholds. Optional semantic augmentation with local embeddings applies in Python implementations. Optional local LLM re-ranking provides additional validation. Output assembly combines topic identifiers with vector identifiers for OpenRTB content.cat arrays with configurable cattax. Sensitive Content Detection flags receive surfacing with drop-scd exclusion options.
Security and privacy considerations
The local-first architecture processes data on user machines without requiring external API access. No personally identifiable information enters the processing pipeline as CSV and JSON files process in-memory without external transmission. Air-gapped deployment capabilities enable fully offline operation when models are prebundled.
The system implements BSD 2-Clause licensing requiring IAB attribution in deployed user interfaces or footer areas. The documentation specifies: "IAB is a registered trademark of the Interactive Advertising Bureau. This tool is an independent utility built by Mixpeek for interoperability with IAB Content Taxonomy standards."
Support channels include GitHub Issues for technical problems, documentation at the Mixpeek IAB Mapper website, and enterprise support contact options for custom integrations or questions about multimodal classification extensions.
Migration planning implications
The weeks-to-months manual migration timeline reduction to seconds through the AI mapper addresses operational challenges facing advertising technology platforms. The Content Taxonomy 3.0 expansion from approximately 400 categories to 1,500+ creates substantial mapping work without automation tools.
The confidence scoring system enables quality assurance workflows where human reviewers focus on lower-confidence mappings requiring domain expertise. High-confidence automated mappings accelerate the bulk of migration work while preserving human judgment for edge cases.
The override support accommodates situations where automated mapping produces incorrect results or where business requirements dictate specific category assignments. These manual overrides integrate into the mapping pipeline alongside automated processing, enabling hybrid approaches balancing efficiency with accuracy.
The audit output capabilities provide transparency into mapping decisions, documenting which categories received automated assignments versus manual intervention. This documentation supports compliance requirements and quality assurance processes across organizations implementing taxonomy migrations.
Timeline
- November 2017: IAB Tech Lab releases Content Taxonomy 2.0
- October 2020: IAB Tech Lab releases Content Taxonomy 2.1
- December 2020: IAB Tech Lab releases Content Taxonomy 2.2
- July 2020: Content Taxonomy 1.0 officially deprecated
- June 2022: IAB Tech Lab releases Content Taxonomy 3.0 with updates for News, Video/CTV, Podcasts, Radio, Games, and App stores
- December 2024: IAB Tech Lab releases Content Taxonomy 3.1 with genre category updates including Thriller
- January 29, 2025: IAB Tech Lab releases official mapping between Content Taxonomy 1.0 and 2.0
- February 11, 2026: IAB Tech Lab announces Mixpeek donation of open-source Taxonomy Mapper
Summary
Who: IAB Tech Lab received a donation from Mixpeek, a company specializing in AI-powered content analysis and classification tools. Katie Shell, Associate Product Manager at IAB Tech Lab, announced the donation. Ethan Steininger serves as Founder and CEO of Mixpeek.
What: An open-source, AI-powered taxonomy mapping tool designed to accelerate migration from IAB Content Taxonomy 2.x to version 3.1. The system uses TF-IDF, BM25, KNN, and LLM re-ranking methods to translate category labels. The tool runs locally, processes CSV or JSON inputs, generates confidence-scored mappings, and exports results ready for OpenRTB and VAST integration. The mapper reduces migration timelines from weeks or months of manual work to seconds through automated processing.
When: IAB Tech Lab announced the Mixpeek donation on February 11, 2026. Content Taxonomy 3.1 was released in December 2024 following the June 2022 release of version 3.0. The public comment period for version 3.1 concluded on January 24, 2025.
Where: The tool operates as local infrastructure deployed on user machines without requiring external API access or cloud processing. The open-source repository is available through GitHub with client libraries in Python (PyPI) and JavaScript (npm). Organizations including SSPs, DSPs, publishers, and brand safety vendors can access and implement the mapper within their existing classification pipelines.
Why: The taxonomy migration addresses AI system requirements for structured vocabulary defining what agents can retrieve, optimize against, and act upon. Content Taxonomy 3.1 expands from approximately 400 categories to more than 1,500, providing more granular classification for connected television, video, podcasts, games, and app stores. The mapper transforms taxonomy alignment from proprietary manual processes into programmable shared infrastructure, enabling interoperability across creative optimization, reporting, and safety models. The donation supports the broader advertising ecosystem transition away from deprecated Content Taxonomy 1.0 toward current standards required for programmatic advertising operations.