AI agents are rapidly moving from demos and copilots into production systems that browse the web, call APIs, execute workflows, and take real‑world actions. As this transition happens, a critical truth is becoming unavoidable: any agent with meaningful capability will be attacked—and most are easy to break. Jailbreaking and prompt injection attacks are not theoretical research topics or rare edge cases; they are an inevitable outcome of deploying autonomous, instruction‑following systems in adversarial environments.
This talk is a practical, engineering‑focused primer on how AI agents fail under real‑world pressure, and what organizations must understand before shipping an agent into production. Rather than focusing on sensational examples or hypothetical risks, we will examine the concrete mechanisms that attackers use today, why they work, and why many popular defenses provide little real protection.
We begin with a clear, accessible overview of jailbreaking and prompt injection attacks. Attendees will learn how attackers manipulate model instructions, context windows, and tool‑calling behavior to override intended safeguards. We’ll cover both direct prompt injection (explicitly malicious instructions) and indirect prompt injection, where hostile content is embedded in webpages, documents, emails, or user‑generated data that agents are designed to consume. These attacks are especially dangerous because they exploit normal, expected behavior rather than software bugs.
From there, we’ll explore several recurring failure modes that appear across nearly all production agent architectures:
Excessive agency: Agents are often given broader permissions and autonomy than necessary, turning minor instruction hijacks into high‑impact incidents. Prompt leakage: System prompts, policies, secrets, and internal instructions are frequently exposed or inferable, providing attackers with a roadmap for further exploitation. Vector and embedding weaknesses: Retrieval‑augmented generation systems can be poisoned or manipulated, allowing malicious content to outrank trusted sources and influence agent decisions. Tool and browser abuse: Agents that browse the web or execute actions are uniquely vulnerable to hostile environments intentionally crafted to manipulate them. A key focus of the talk is why AI guardrails don’t work the way many teams expect. We’ll examine common approaches—prompt‑based restrictions, content filters, and policy‑layer defenses—and explain why they are brittle, bypassable, and often fail silently. Rather than stopping attacks, these mechanisms frequently create a false sense of security that masks deeper architectural risks.
We’ll also address a common misconception in the industry: “If these vulnerabilities are so serious, why haven’t we seen major AI security incidents yet?” The answer is not that systems are safe, but that most deployments are still constrained—limited autonomy, limited blast radius, and cautious rollout. As organizations move toward browser agents, long‑running autonomous workflows, and systems with real operational authority, the conditions that have so far prevented large‑scale incidents will disappear. When that happens, these attack classes will move from curiosity to crisis.
The final section of the talk focuses on what actually works. Instead of recommending yet another AI security product or guardrail framework, we will outline practical, proven steps organizations can take today, grounded in decades of security engineering experience:
Applying least privilege and minimizing agent capabilities Isolating tools, credentials, and execution environments Designing for failure and containment, not perfect prevention Monitoring agent behavior for abuse patterns rather than policy violations Performing threat modeling that treats prompts and context as untrusted input Attendees will leave with a clear mental model of how AI agents are attacked, why these attacks succeed, and how to reduce risk without relying on ineffective silver bullets. This talk is intended for engineers, security practitioners, and technical leaders building or deploying AI agents who want to understand the real risks—and take responsible action—before putting these systems into production.