Security Conference

Posted Aug 11, 2021

By Grace L

3 min read

Table of contents:

Black Hat

Attack Type	Description	Example	Risk Level
Prompt Injection	Malicious instructions hidden in prompts or external content.	AI told to give away secrets or execute destructive commands.	🔴 High
Context Poisoning	Corrupting training or inference-time data fed to model.	Adding misleading info to a database LLM queries.	🔴 High
Access Control Issues	“Confused deputy” – agent has more privileges than user.	Low-priv user triggers high-priv agent action.	🔴 High
Tool Misuse	Exploiting agents’ tool access to run harmful commands.	AI executes file delete instead of file read.	🟠 Medium
Memory Poisoning	Altering short/long-term memory to influence future behavior.	Fake “facts” stored in agent memory.	🟠 Medium
Cascading Hallucination	Error from one agent spreads to others.	Wrong data from Agent A causes bad output in Agent B.	🟠 Medium

Control	Purpose	Tools/Methods
Model Scanning	Detect malware in model files.	Antivirus + AI-specific scanners.
Runtime Security	Catch abnormal AI behavior during execution.	Anomaly detection, runtime firewalls.
Human-in-the-loop	Approve/deny high-risk AI actions.	Review checkpoints in workflows.
AuthN/AuthZ	Control agent-to-agent communications.	OAuth 2.0, OIDC, mTLS, X.509 certs.
Context Sanitization	Clean and validate external inputs.	Filtering, regex checks, schema validation.
Logging & Auditing	Trace agent actions & context changes.	Centralized logging, immutable audit logs.

Use Case	How AI Helps	Example
Threat Intel	Correlate CVEs, exploits, chatter in real time.	AI parses GitHub, social media, CVE feeds.
Vuln Prioritization	Rank vulnerabilities by exploit activity & business impact.	Prioritize based on attacker chatter.
AI Code Review	Multi-agent PR analysis by security domain.	Developer agent + Architect agent + Security agent.

AI agents can execute tools, access data, and interact with other agents — all of which expand the attack surface.
Risks include:
- Prompt injection (malicious instructions in prompts)
- Context poisoning (compromising the data fed into the model)
- Access control issues (e.g., confused deputy problem)
- Tool misuse and cascading hallucinations
- Memory poisoning (short-term & long-term)
Orchestrator → specialized agent architectures multiply risks.

Context comes from instructions, memory, tools, and other models.
Protect both inference-time and training-time context.
Context is King
- Sources of context:
- Direct instructions (prompts)
- Short-term memory (chat history)
- Long-term memory (vector DB, docs, code)
- Tool outputs
- Other LLMs
LLMs do NOT distinguish between system prompt and user prompt — corruption anywhere is dangerous.
- LLMs treat all context (system prompts, user prompts) as one — making poisoned context highly dangerous.
Context Protection
- Techniques to validate, sanitize, and isolate context sources.
- Monitoring for unexpected context changes.

LLM apps → Tool-using agents → MCP-based agents → upcoming auto-discovery & self-modifying agents → general-purpose agents.

Scan models for malicious code.
Add runtime security and logging for AI apps.
Use human-in-the-loop for high-risk actions.
Apply traditional security principles (auth, authz, least privilege) to AI systems.
Consider protocols for agent-to-agent authentication (OAuth2.0, mTLS, certificates).
Authentication & Authorization for AI Agents
- How OAuth2.0, OpenID Connect, mTLS, and certificate-based auth can be applied to agent-to-agent communications.
- Limitations of existing protocols in AI agent ecosystems.

Prioritize open-weight models from trusted vendors.
Key pillars:
- Traceable & transparent stack
- Secure stack (T&E, monitoring, adversarial testing, AI BOM)
Guardrails (e.g., Llama Guard) help but don’t solve model misalignment.
Model Alignment & Governance
- Guardrails vs. alignment issues.
- AI Bill of Materials (AI BOM) concept.
- Approaches to AI red teaming.

This post is licensed under CC BY 4.0 by the author.