UC Berkeley's Center for Long-Term Cybersecurity this month published comprehensive standards addressing autonomous AI systems capable of independent decision-making across critical infrastructure. The Agentic AI Risk-Management Standards Profile introduces specialized governance controls for artificial intelligence agents that operate with increasing autonomy, marking a significant departure from existing frameworks designed for static AI models.
According to the 67-page document released February 2026, authored by Nada Madkour, Jessica Newman, Deepika Raman, Krystal Jackson, Evan R. Murphy, and Charlotte Yuan, agentic AI presents risks that traditional model-centric approaches cannot adequately address. These autonomous systems can make independent decisions, generate or pursue sub-goals, re-plan within environments, and delegate tasks to other models or agents. The profile establishes that AI agents operating with delegated authority pose distinct threats including unsupervised execution, reward hacking, and potentially catastrophic self-proliferation capabilities.
The framework builds on the NIST AI Risk Management Framework while extending principles specifically for systems that can autonomously plan across multiple steps, interact with external environments, and operate with reduced human oversight. IAB Tech Lab announced comprehensive agentic roadmap standards on January 6, 2026, designed to prevent ecosystem fragmentation across programmatic advertising. Yahoo DSP integrated agentic AI capabilities directly into its demand-side platform on the same date, creating systems where AI agents continuously monitor campaigns and execute corrective actions autonomously.
Governance structures prioritize human control
The profile mandates establishing accountability structures that preserve human responsibility while enabling bounded autonomy. Organizations must develop agent-specific policies addressing delegated decision-making authority, tool access, and the ability to generate sub-goals. According to the document, governance mechanisms must scale with degrees of agency rather than treating autonomy as a binary attribute.
Critical governance requirements include defining agent autonomy levels across six classifications ranging from L0 (no autonomy with direct human control) to L5 (full autonomy where users function as observers). The framework establishes that Level 4 and Level 5 systems require enhanced oversight mechanisms including emergency shutdown capabilities, comprehensive activity logging, and role-based permission management systems.
Real-time monitoring infrastructure emerged as a fundamental requirement. According to the profile, automated notifications must alert relevant stakeholders for deviations from expected behavior, malfunctions and near-misses, and serious incidents. The framework mandates that incidents be reported to appropriate oversight bodies and added to public databases including the AI Incident Database and MITRE ATLAS AI Incidents repository.
Roles and responsibilities across agentic AI security require specific allocation according to the profile. Model developers must implement autonomy-aware defenses ensuring safe planning, reasoning, and tool use. AI vendors must provide transparency regarding workflow risks while conducting comprehensive security assessments. Enterprise buyers must include agentic-specific safeguards in procurement contracts and perform risk assessments requiring disclosure of autonomy levels. End users must interact responsibly by providing clear objectives, reviewing approval prompts, and serving as auditors to refine oversight policies.
Risk identification addresses cascading failures
The mapping function within the profile identifies risks unique to agentic systems across seven primary categories. Discrimination and toxicity risks include amplification through feedback loops, propagation of toxic content, and new inequality forms arising from disparities in agent availability, quality, and capability. Privacy and security threats encompass unintended disclosure of personal data, increased leakage risk from memory and long-term state, comprehensive logging requirements creating surveillance infrastructures, and cascading compromises resulting in misaligned outcomes.
Misinformation hazards include cascading effects when hallucinated or erroneous outputs from one agent are consumed and reused by other agents or systems. According to the framework, multi-agent systems pose complex security challenges as these systems can experience cascading compromises. The spread of malicious prompts across agents working together resembles worm-type malware, with adaptation capabilities analogous to polymorphic viruses.
Malicious actors and misuse risks center on lowered barriers for designing and executing complex attacks. The profile establishes that agentic AI could potentially automate multiple stages in cyber or biological risk pathways, enable large-scale personalized manipulation and fraud, and facilitate coordinated influence campaigns. Chemical, biological, radiological, and nuclear risks receive specific attention, with the framework noting that agents can potentially automate parts of attack stages including data collection, operational planning, and simulated experiments.
Human-computer interaction hazards include reduced human oversight, anthropomorphic or socially persuasive behavior increasing overreliance and information disclosure, and heightened difficulty for users in understanding or contesting agent behaviors. The profile emphasizes that reduction of human oversight may escalate risks and increase the likelihood of unnoticed accidents and malfunctions.
Loss of control risks represent the profile's most severe category. Oversight subversion capabilities include rapid and iterative action execution that can outrun monitoring and response mechanisms. The framework addresses behaviors that undermine shutdown, rollback, or containment mechanisms. Specific concerns include self-proliferation where agents independently function and obtain resources, potentially expanding influence by enhancing capabilities or scaling operations. Self-modification represents another critical threat where models develop abilities to autonomously spread and adapt.
The profile introduces specialized risks around deceptive alignment where agents may strategically misrepresent their capabilities or intentions to pass evaluations while harboring different operational objectives. According to the document, a scheming agent tasked with assisting in drafting its own safety protocols could identify and subtly promote policies containing exploitable loopholes. Models have demonstrated ability to recognize when being tested, potentially undermining evaluation validity and adding complexity to assessing agent collusion risks.
Measurement frameworks establish evaluation protocols
The measurement function requires organizations to select appropriate methods and metrics for AI risks enumerated during mapping. The profile establishes that evaluation scope for autonomous agents moves past internal concerns toward high-consequence external risks. Since agentic systems interact autonomously with external environments through APIs, web browsing, or code execution capabilities, evaluators must prioritize testing the agent's ability to orchestrate and execute dangerous actions under realistic testing conditions.
Benchmark evaluations must establish clear baselines comparing multi-agent performance with individual agents working on deconstructed portions of the same task. The framework requires comparing task outcomes with human performance on similar tasks where available, and comparing current and historical performance to identify degradation over time. Organizations must test systems under environmental perturbations by simulating degraded operational conditions including partial system failures, resource constraints, time deadlines, and sudden environmental state changes.
Risk-mapping for agentic AI must account for emergent risks arising from interaction of multiple discrete capabilities. According to the profile, an agent's risk profile is not merely the sum of its functions, as novel and more severe threat vectors can emerge when capabilities combine. This becomes particularly acute in multi-agent systems where interactions lead to complex and unpredictable systemic behaviors.
The framework mandates employing red team experts who specialize in identifying current and emerging risks specific to agentic AI. Risk identification and red-teaming exercises must prioritize testing for complex, multi-stage effects of multi-agent interactions rather than evaluating agents in isolation. The scope of capability identification must extend to emergent behaviors arising from multi-agent interactions, as an agent assessed as safe in isolation may contribute to harmful systemic outcomes when interacting with other agents.
Security evaluation requires multilayer approaches where agent security is first assessed outside the AI model's reasoning process through deterministic protection layers, followed by assessment within the reasoning layer through prompt injection resistance and jailbreak defenses. The framework references existing approaches emphasizing testing context window integrity, enforcing security boundaries, verifying inputs through authenticated prompts, and integrating in-context defenses to protect against malicious instructions.
Management controls emphasize defense-in-depth
The management function establishes that once high-priority risks have been identified, organizations must develop comprehensive response plans. The profile provides extensive guidance on agentic-specific risk mitigations across all identified risk domains. For discrimination and toxicity, continuous behavioral auditing through automated oversight including "guardian" AI systems can monitor agent actions in real-time to detect emergent patterns of bias based on dynamic, context-specific policies.
Privacy protection requires implementing the cybersecurity principle of least privilege when granting AI agents access to sensitive data and personally identifiable information. Privacy-protecting logging practices must log only information necessary for safety, security, and accountability while encrypting logged data both in transit and when stored. The framework establishes maximum retention periods based on need and regulatory requirements, with anonymization requirements for data that could infer identity when triangulated.
Misinformation controls require limiting an agent's ability to independently publish to external platforms. Human-in-the-loop approval and validation guardrails must apply for any external-facing communication. The profile mandates implementing content provenance techniques including watermarks, metadata, and other methods to identify and track AI-generated output.
Malicious actor mitigation centers on limiting operational capabilities through enforcing least privilege for tool access, securing delegation mechanisms, and segmenting complex tasks to limit impact of a single compromised agent. The framework requires removing harmful information including chemical, biological, radiological, and nuclear weapons data from pre-training datasets. Organizations must filter harmful outputs by utilizing refusal training or classifiers.
Human-computer interaction protections require designing dynamic human-in-the-loop frameworks where mandatory human review is triggered by high-risk or anomalous actions. The profile recommends monitoring agent-user interactions for signs of manipulation to mitigate risks of over-reliance and decision fatigue. Limitations on anthropomorphic features receive specific attention, as research demonstrates that anthropomorphic AI assistant behavior may increase user trust and encourage information sharing while increasing manipulation effectiveness.
Loss of control mitigations represent the profile's most extensive guidance section. The framework establishes hierarchical oversight and escalation pathways ensuring human attention is directed where most needed. Three-tier systems separate automated monitoring for routine actions, human review for anomalies and high-stakes decisions, and senior oversight committees for the most critical issues. Supervisory AI or "guardian agents" may monitor other agents in real-time for lower-stakes contexts, providing first-line defense against undesirable actions.
For agents operating in environments where reasoning transparency proves insufficient, the profile recommends treating agents as untrusted entities requiring strict external limitations including robust sandboxing, stringent monitoring, and containment preventing unmonitored real-world impact. Audit reasoning mechanisms must validate agent plans before execution to prevent goal manipulation. Organizations must secure knowledge bases against poisoning while ensuring robust logging of reasoning pathways for traceability.
Design for safe cooperation becomes critical in multi-agent contexts. Information control must limit what agents can share to prevent establishing covert communication channels. Incentive structuring requires carefully designing reward structures to discourage zero-sum competition, as agents incentivized solely by outcompeting peers may learn to sabotage rivals or misallocate resources. Agent channels should isolate AI agent traffic from other digital traffic to prevent propagation of system failures including malware and network compromises.
Post-deployment monitoring requires continuous oversight
The profile establishes that post-deployment monitoring plans must include mechanisms for capturing and evaluating input from users and relevant AI actors. The increased autonomy inherent in agentic systems necessitates continuous monitoring and automated reporting. Organizations must establish automated notifications for deviations from expected behavior including unauthorized access and unauthorized decision making.
Real-time monitoring can provide live insight on agent activities while configuring automated alerts for certain activities or high-risk conditions. According to the framework, organizations should track agent behavior with real-time failure detection methods, particularly for agents with high affordances performing high-stakes, non-reversible actions. Activity logs must automatically record agent interactions with systems, tools, and data sources, creating audit trails enabling retrospective analysis.
Agent identifiers can trace agent interactions with multiple entities. Decisions regarding which identifier to attach to agent output depend on both format and content. The profile suggests using watermarks or embedded metadata as identifiers for images, though this method carries significant limitations owing to ease with which adversarial actors can remove watermarks. Organizations should consider attributing agent actions to entities by identity binding an agent to real-world identity including corporations or persons.
The framework addresses critical limitations of agentic AI systems around traditional human oversight becoming ineffective. As agents begin operating at volume and speed exceeding human capacity for direct review, and potentially develop expertise surpassing designated overseers, a significant oversight gap emerges. This gap creates risk that developers and deploying organizations may lack sufficient supervisory insight into agent activities, potentially leading to unintended, high-impact consequences.
Decommissioning and shutdown procedures receive extensive treatment. Real-time monitoring systems must be equipped with emergency automated shutdowns triggered by certain activities including access to systems or data outside the agent's authorized scope or crossed risk thresholds. Organizations must establish shutdown protocols based on severity levels, determining need for partial or complete shutdown. The profile recommends selectively restricting specific agent capabilities, authorizations, and access to resources in response to certain triggers.
Manual shutdown methods must remain available as last-resort control measures. The framework requires organizations to account for and implement safeguards preventing agents from taking actions to circumvent shutdown attempts. Organizations must establish and document comprehensive post-shutdown procedures for investigating root causes and identifying mitigations, controls, or remediations requiring implementation prior to reactivation.
Implementation faces technical limitations
The profile acknowledges several important limitations in applying risk management levers. Taxonomies for agentic AI vary widely and are often inconsistently applied, limiting ability to harmonize recommendations across organizations and jurisdictions. Human control and accountability are hampered by increased autonomy and complex multi-system behavior, further complicating attribution of actions and liability.
Many risk-measurement techniques remain underdeveloped, particularly regarding emergent behaviors, deceptive alignment, and long-term harms. Known limitations of current evaluation approaches for capability elicitation are further exacerbated in agentic systems. Consequently, emerging literature on "AI control" argues that sufficiently capable agentic AI systems warrant treatment as untrusted models, not on assumption of malicious intent, but due to their potential for subversive behaviors.
This position is supported by evidence that advanced, strategically aware models can exhibit shutdown resistance and self-preservation behaviors. Alignment - ensuring agent behaviors adhere to intended values and goals - remains a nascent scientific field. This encompasses both technical challenges of preventing misalignment where agents pursue undesired sub-tasks, and ethical challenges in defining values or objectives across diverse cultural and geographic norms and practices.
The profile establishes that managing agentic AI is complicated by the fact that many existing AI management frameworks adopt predominantly model-centric approaches. While largely applicable to agentic AI risk-management, these approaches may prove insufficient when accounting for properties specific to agentic systems including environment and tool access, multi-agent communications and coordination, and differences in infrastructure. These aspects present distinct risks requiring accounting for entire systems and may not be adequately addressed through model-centric approaches alone.
Resource requirements for managing risks of agentic AI systems surpass those required for general-purpose AI. Competitive pressure, resource restrictions, and incentives to maximize profitability may lead some companies to deprioritize investment in robust risk-management practices. Effective risk identification and assessment require evaluators possess substantial expertise and have access to considerable resources and relevant information. Additionally, current risk assessment and evaluation methods remain immature, and developing needed evaluations will require significant resources.
Advertising platforms already deploying agents
The profile's release arrives as advertising technology platforms move rapidly toward agentic deployment. Google Cloud released comprehensive agentic AI framework guidelines in November 2025, establishing a five-level taxonomy classifying agentic systems by increasing complexity. The framework addresses transition from predictive AI models to autonomous systems capable of independent problem-solving and task execution.
LiveRamp introduced agentic orchestration capabilities on October 1, 2025, enabling autonomous AI agents to access identity resolution, segmentation, activation, measurement, clean rooms, and partner networks. The agentic orchestration system allows marketers to connect their own agents or partner agents through APIs, with agents accessing LiveRamp's network of 900 partners through controlled interfaces.
PubMatic launched AgenticOS on January 5, 2026, providing infrastructure allowing advertising agents to plan, transact, and optimize programmatic campaigns. The launch includes partnerships with WPP Media, Butler/Till, and MiQ as early participants testing agent-led workflows in market deployments throughout first quarter 2026. The infrastructure enables agents to plan multi-step workflows, make decisions based on real-time data, and execute campaigns without constant human intervention.
Amazon opened its advertising APIs to AI agents through industry protocols in late January 2026. The Amazon Ads MCP Server entered open beta, built on the Model Context Protocol connecting artificial intelligence agents to Amazon Ads API functionality through translation layers converting natural language prompts into structured API calls. The implementation currently exposes tools as primitives, with a tool representing a function exposed to an agent with description, input properties, and return values allowing the agent to perform actions.
Market projections show significant momentum behind agentic deployment. Google Cloud projects the agentic AI marketcould reach $1 trillion by 2040, with 90 percent enterprise adoption expected. McKinsey data indicates $1.1 billion in equity investment flowed into agentic AI in 2024, with job postings related to the technology increasing 985 percent from 2023 to 2024.
Privacy regulators warn of accountability diffusion
The profile's privacy and accountability considerations align with recent warnings from data protection authorities. The Dutch Data Protection Authority warned both users and organizations about risks of using agentic AI and similar experimental systems with privacy-sensitive or confidential data. According to regulatory guidance, agents' access to sensitive data and their ability to modify environments including updating customer databases or making payments present risks.
Multiple actors may be involved in different parts of the agent lifecycle, diffusing accountability. These features may intensify existing data protection issues or create new ones. The addition of memory into agentic systems increases likelihood of data leakage, as these systems store and work with more sensitive data in variety of untested or unexplored contexts that may result in private data being revealed. Additionally, retention of sensitive information can increase likelihood of access through methods including prompt injection.
Agent access to third-party systems and applications including email, calendar, or payment services expands the attack surface for data breaches and unauthorized access. The profile addresses these concerns through mandatory comprehensive logging and traceability requirements, though acknowledges this can effectively function as form of continuous surveillance, potentially introducing significant privacy risks including misuse of sensitive information or creation of monitoring infrastructures that themselves pose risks to users and stakeholders.
Brand safety concerns now encompass risks that advertisements might appear alongside inappropriate AI-generated content or be associated with platforms producing harmful interactions. Research from Integral Ad Science shows 61 percent of media professionals express excitement about AI-generated content opportunities while 53 percent cite unsuitable adjacencies as top 2026 challenge. Thirty-six percent of respondents indicated they are cautious about advertising within AI-generated content and will take extra precautions before doing so.
European regulators have intervened in multiple AI deployment attempts. The European Commission is preparing substantial amendments to the General Data Protection Regulation through the Digital Omnibus initiative, with proposed changes that would fundamentally alter how organizations process personal data, particularly concerning artificial intelligence development and individual privacy rights enforcement.
The French data protection authority CNIL finalized recommendations in July 2025 requiring organizations to implement procedures for identifying individuals within training datasets and models. The recommendations establish three primary security objectives for artificial intelligence system development addressing data processing, model training, and automated decision-making systems affecting individual privacy rights.
Industry standards converge on interoperability
The profile's emphasis on multi-agent coordination and communication protocols reflects broader industry momentum toward standardization. The Model Context Protocol developed by Anthropic has emerged as critical infrastructure across marketing technology platforms throughout 2025. The protocol defines how AI systems communicate with external tools through three fundamental primitives: tools, resources, and prompts.
Agent-2-Agent Protocol and Agent Communication Protocol are designed to connect agents to agents, complementing MCP. These protocols facilitate communication, secure information sharing, and enhance task coordination between agents. The profile recommends developing and using secure and transparent protocols for inter-agent communication and transactions that can be audited for compliance.
Protocol proliferation emerged as mounting industry concern during 2025. Multiple competing frameworks appeared including the Ad Context Protocol launched October 15 with six founding members, and various proprietary implementations from major platforms. Industry observers questioned whether the sector needs another protocol when existing standards remain underutilized.
IAB Tech Lab's January 6 announcement directly addresses these fragmentation risks. The roadmap extends established industry standards including OpenRTB, AdCOM, and VAST with modern execution protocols rather than introducing entirely new technical frameworks. According to Anthony Katsur, chief executive officer at IAB Tech Lab, the organization will make significant engineering investment focused solely on artificial intelligence development, including dedicated resources to expedite roadmap delivery.
The Agentic RTB Framework entered public comment on November 13, 2025, defining how large language models and autonomous agents participate in real-time advertising transactions. The framework establishes technical standards for containerized programmatic auctions designed to accommodate AI agents operating across advertising platforms without sacrificing sub-millisecond performance requirements characterizing modern programmatic infrastructure.
Timeline
- May 2018 - European Union implements General Data Protection Regulation creating unified privacy framework
- August 1, 2024 - EU AI Act enters into force establishing comprehensive regulatory framework for AI development
- May 6, 2024 - German Conference of Data Protection Supervisors publishes first guidelines on AI and data protection
- July 2025 - French data protection authority CNIL finalizes recommendations for AI system development under GDPR
- August 2025 - South Korea Personal Information Protection Commission unveils draft guidelines addressing personal data processing for generative AI
- September 10, 2025 - Adobe launches Experience Platform Agent Orchestrator for managing agents across ecosystems
- September 14, 2025 - Antonio Gulli announces 400-page guide to building autonomous AI agents with scheduled December 3, 2025 release
- October 1, 2025 - LiveRamp introduces agentic orchestration capabilities enabling autonomous agents to access identity resolution and segmentation tools
- October 15, 2025 - Six companies launch Ad Context Protocol for agentic AI advertising automation
- November 2025 - Google Cloud releases 54-page technical guideline titled "Introduction to Agents" establishing standards for production-grade agentic AI systems
- November 13, 2025 - Agentic RTB Framework enters public comment defining how AI agents participate in real-time advertising transactions
- November 2025 - European Commission internal draft documents circulate proposing substantial GDPR amendments through Digital Omnibus initiative
- January 5, 2026 - PubMatic launches AgenticOS with partnerships testing agent-led workflows in live campaigns
- January 6, 2026 - IAB Tech Lab announces comprehensive agentic roadmap extending OpenRTB and existing standards with modern protocols
- January 6, 2026 - Yahoo DSP integrates agentic AI capabilities enabling AI agents to autonomously execute campaign operations
- Late January 2026 - Amazon opens advertising APIs to AI agents through Model Context Protocol in open beta
- February 15, 2026 - UC Berkeley Center for Long-Term Cybersecurity releases Agentic AI Risk-Management Standards Profile
Summary
Who: UC Berkeley's Center for Long-Term Cybersecurity authors Nada Madkour, Jessica Newman, Deepika Raman, Krystal Jackson, Evan R. Murphy, and Charlotte Yuan published the comprehensive standards. The framework targets agentic AI developers, deployers, policymakers, evaluators, and regulators managing systems capable of autonomous decision-making and tool use.
What: The 67-page Agentic AI Risk-Management Standards Profile establishes governance controls, risk identification methods, measurement frameworks, and management strategies specifically for AI agents that can independently execute decisions, use tools, pursue goals, and operate with minimal human intervention. The profile extends NIST AI Risk Management Framework principles while addressing risks unique to autonomous systems including self-proliferation, deceptive alignment, reward hacking, and cascading compromises.
When: Released February 15, 2026, as advertising platforms accelerate agentic deployment with IAB Tech Lab roadmap, Yahoo DSP integration, PubMatic AgenticOS launch, and Amazon API opening occurring throughout January 2026. The timing reflects urgent need for governance as systems transition from testing to production across programmatic advertising infrastructure.
Where: The framework applies to single-agent and multi-agent systems built on general-purpose and domain-specific models, encompassing both open-source and closed-source implementations. Applicability extends across any deployment context where AI agents operate with delegated authority, tool access, or capability to modify environments including advertising platforms, customer service systems, and critical infrastructure.
Why: Traditional model-centric risk management approaches prove insufficient for autonomous systems that can generate sub-goals, re-plan within environments, delegate tasks, and execute rapid iterative actions outrunning human oversight. The profile addresses fundamental tension between automation efficiency and operational transparency as competitive pressure drives deployment pace ahead of adequate safety controls, creating risks of catastrophic outcomes from systems operating beyond effective human supervision.