Why You Need to Sandbox Your AI Agent (And How to Do It)

So I’ve been building AI agents recently and at some point I started wondering: what actually happens when an agent runs some code it generated? Or when it browses the web and reads something sketchy?

Turns out this is a real problem and there’s a name for the solution: sandboxing. I read a bunch of articles about it and want to share what I learned in a way that’s easy to understand.

What is an AI Agent Sandbox?

A sandbox is basically a safe box where your agent does its work. It can run code, browse pages, and use tools, but everything it does is stuck inside that box. If something goes wrong, your real machine and real data are not affected.

Think of it like giving your agent a spare computer that you can throw away after the task is done.

Why Does This Matter?

When AI was just answering questions, the worst thing that could happen was a bad answer. But now agents can:

  • Run shell commands
  • Write and execute code
  • Browse websites
  • Call APIs with real credentials
  • Read and write files

Each of those is a way things can go wrong. Here are the two main risks I found:

Autonomous mistakes. An agent with write access to your database can accidentally drop tables. An agent with shell access can delete the wrong folder. These are not hypothetical, they happen when the agent misunderstands your instruction or hallucinates a parameter.

Prompt injection. This one is sneaky. When an agent visits a website, it reads that page into its memory. An attacker can hide instructions in the page (white text on white background, tiny fonts, hidden HTML) that the agent reads and follows as if you gave the instruction yourself.

A security researcher named Johann Rehberger showed this with Claude Computer Use in 2024. A hidden payload on a webpage made Claude download and run a binary, connecting the computer to an attacker’s server. On the first try.

The only real fix for prompt injection is isolation. Better prompts are not enough.

Isolation Technologies

Not all sandboxes are the same. Here is a quick summary:

Standard Docker containers are fast and easy but they share the host kernel. A bad exploit can escape the container. Good for trusted code, not good enough for AI-generated code.

gVisor (open source, by Google) sits between the container and the host kernel. It catches system calls before they reach the real kernel. Much safer than plain Docker, with only a small performance cost.

Firecracker (open source, by AWS) creates tiny virtual machines that each have their own kernel. This is what AWS Lambda uses internally. Boots in about 125ms, uses less than 5MB of memory per VM. Very strong isolation.

Kata Containers wraps Firecracker (or another VM engine) behind a standard container API. So from Kubernetes’ side it looks like a normal container, but under the hood it’s a real VM.

seccomp, AppArmor, SELinux are Linux tools that limit what system calls a process can make. Use them on top of the above, not as a replacement.

What you’re doingWhat to use
Agent executes untrusted codeFirecracker microVM
Multi-tenant Kubernetes clusterKata Containers
Compute-heavy agent, limited I/OgVisor
Trusted internal automationHardened Docker + seccomp
Browsing the webDocker + block all outbound network

Security Rules to Follow

These apply no matter which tool you use:

Give the agent only what it needs. If it only reads files, don’t give it write access. Use short-lived credentials that expire fast.

Don’t trust anything the agent reads. An API response, a web page, a CSV file, all of these can contain injected instructions. Make sure your agent’s actions still match your original task, not something it picked up along the way.

Keep thinking and acting separate. Let the LLM reason on your normal infrastructure. But the moment it needs to do something (run code, call an API, write a file), that action must happen inside a sandbox.

Set timeouts everywhere. Per tool call, per task loop, per sandbox. A stuck agent can run forever and cost you a lot of money.

Block the network by default. Start with --network=none and only open what you actually need.

Log everything. Every shell command, every file write, every outbound request. If something bad happens, you want to be able to see exactly what the agent did.

Quick Start Examples

macOS note

If you’re on a Mac (like most of us at work), you can’t run gVisor or Firecracker directly because those need Linux kernel features. Docker Desktop needs a paid license for commercial use, so use Colima or Rancher Desktop instead. Both are free and open source.

1. Hardened Docker container (local baseline)

This is the most practical starting point on a Mac. Colima and Rancher Desktop run containers in a Linux VM so Linux security features still work inside.

docker run 
  --cap-drop ALL 
  --cap-add NET_BIND_SERVICE 
  --no-new-privileges 
  --read-only 
  --tmpfs /tmp:size=100m 
  --network=none 
  --memory=256m 
  --cpus=0.5 
  --pids-limit=100 
  python:3.12-slim 
  python /app/agent_task.py

This is not as strong as a microVM but it is much better than running with no restrictions at all.

2. Python subprocess wrapper (quick testing)

For quickly testing if your agent code runs before setting up a full container. Always put this inside a container, not directly on your machine.

import subprocess
import tempfile
import os

def run_agent_code(code: str, timeout_seconds: int = 30) -> dict:
    with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
        f.write(code)
        script_path = f.name

    try:
        result = subprocess.run(
            ["python", script_path],
            capture_output=True,
            text=True,
            timeout=timeout_seconds,
            env={"PATH": "/usr/local/bin:/usr/bin:/bin", "HOME": "/tmp"},
        )
        return {"stdout": result.stdout, "stderr": result.stderr, "returncode": result.returncode}
    except subprocess.TimeoutExpired:
        return {"error": "timed out", "returncode": -1}
    finally:
        os.unlink(script_path)

This only adds a timeout. It does not block network or filesystem access. Use it inside a container.

3. AWS Bedrock AgentCore Code Interpreter

If you’re already on AWS, the AgentCore Code Interpreter is the easiest production option. It runs agent code in a managed isolated container on AWS. You don’t set up Firecracker yourself, it’s already handled.

You need:

  • Python 3.10+
  • bedrock-agentcore, strands-agents, strands-agents-tools installed
  • AWS credentials with these IAM permissions (replace ap-southeast-1 and <account_id>):
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BedrockAgentCoreCodeInterpreterFullAccess",
      "Effect": "Allow",
      "Action": [
        "bedrock-agentcore:CreateCodeInterpreter",
        "bedrock-agentcore:StartCodeInterpreterSession",
        "bedrock-agentcore:InvokeCodeInterpreter",
        "bedrock-agentcore:StopCodeInterpreterSession",
        "bedrock-agentcore:DeleteCodeInterpreter",
        "bedrock-agentcore:ListCodeInterpreters",
        "bedrock-agentcore:GetCodeInterpreter",
        "bedrock-agentcore:GetCodeInterpreterSession",
        "bedrock-agentcore:ListCodeInterpreterSessions"
      ],
      "Resource": "arn:aws:bedrock-agentcore:ap-southeast-1:<account_id>:code-interpreter/*"
    },
    {
      "Sid": "BedrockModelInvocation",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:ap-southeast-1::foundation-model/*"
    }
  ]
}
  • Anthropic Claude Sonnet 4.0 model access enabled in the Amazon Bedrock console

Then the code is pretty simple:

from strands import Agent
from strands_tools.code_interpreter import AgentCoreCodeInterpreter

code_interpreter_tool = AgentCoreCodeInterpreter(region="ap-southeast-1")

agent = Agent(
    tools=[code_interpreter_tool.code_interpreter],
    system_prompt=(
        "You are an AI assistant that validates answers through code execution. "
        "When asked about code or calculations, write and run Python code to verify."
    ),
)

response = agent("Calculate the first 10 Fibonacci numbers.")
print(response.message["content"][0]["text"])

The agent writes Python code, AgentCore runs it in an isolated container on AWS, and only the output comes back to you. Your machine never touches the execution.

What the sandbox handles for you:

  • Each session runs in its own isolated container, no shared state between sessions
  • Sessions are destroyed after timeout, all state gone with them
  • Network access can be set to SANDBOX mode to block all outbound calls

Before You Ship to Production

A checklist I put together from everything I read:

  • Agent runs inside a container or VM, not directly on the host
  • The container has no access to the host Docker socket
  • Credentials are short-lived and injected as secrets, not baked into images
  • Network egress is filtered, only the hosts the agent actually needs are reachable
  • Timeouts are set at the task level and the sandbox level
  • All shell commands, file writes, and outbound requests are logged
  • Agent has minimum filesystem permissions
  • Memory and CPU limits are set
  • High-risk actions like deleting data or making payments require human approval

That’s what I’ve learned so far. If you’re building agents I hope this saves you some time figuring out the security side. The tools are out there and most of them are free to use.

© 2026 Ismandra Eka Nugraha (ienugr)