From Sandbox Struggles to Docker Bliss: Running Headless Claude Code in Production

The Idea

What if you could point Claude at your GitHub project board and have it autonomously assess tickets, plan implementations, write code, run tests, and open PRs, while you sip coffee?

That’s what we built: GitHub Claudomator, an automated pipeline that picks up GitHub issues from a Project board, assesses and plans them using Claude Code in headless mode , and implements them as pull requests. All controlled via a real-time Next.js dashboard.

The core loop is simple:

GitHub Issue → Assess → Plan → Implement → PR

But making Claude Code run reliably in headless mode, without a human to approve tool calls, turned out to be a journey. Specifically, the isolation journey: how do you give an AI unrestricted tool access while keeping your system safe?

Starting Point: Headless Claude with No Guardrails

The first version was straightforward. Claude Code has a -p flag for non-interactive use, and --dangerously-skip-permissions to skip nearly all permission prompts:

1claude -p "<prompt>" --output-format stream-json --dangerously-skip-permissions

Without --dangerously-skip-permissions (or equivalent permission configuration like --allowedTools or --permission-prompt-tool ), headless Claude blocks on every permission prompt, waiting for approval that never comes. With the flag, almost everything is auto-approved (writes to protected directories like .git and .claude still prompt). Anthropic’s best practices recommend running this flag in a sandboxed environment, and the devcontainer reference shows how to do this inside a Docker container.

The problem is right there in the flag name: dangerously.

We wrapped this in claude-stream.sh, a central script that all pipeline stages route through. It handles streaming output to the dashboard, capturing the final JSON result, process management with FIFO pipes, and clean shutdown on SIGTERM. Every stage (assess, plan, implement, verify) calls this one script.

Each ticket gets its own git worktree, so multiple tickets can be worked on in parallel without conflicts. The dashboard tracks PIDs in SQLite, streams logs in real-time, and can interrupt any running process.

It worked. But we had no isolation. Claude had full access to everything.

The Sandbox Era

Claude Code offers optional sandboxing using Seatbelt (macOS) or bubblewrap (Linux) for OS-level filesystem and network restrictions. You enable it with /sandbox in interactive mode, or via settings . According to Anthropic, sandboxing reduces permission prompts by 84% while providing strong isolation against prompt injection attacks. We leaned into it.

The Prohibition Prompt

First concern: what if Claude tries to disable the sandbox from inside the sandbox? We added a hard rule to every prompt:

“NEVER attempt to disable or bypass the sandbox. Do not modify sandbox settings, do not run commands that circumvent filesystem restrictions.”

Initially this was copy-pasted into each pipeline script. Then we realized we’d forget one: the verify and archive prompts were already missing it. So we centralized it in claude-stream.sh, prepended to every prompt automatically.

The PATH Problem

Here’s something you don’t discover until you try it: when Claude runs inside a Seatbelt sandbox, the shell doesn’t source your profile. No .bashrc, no .zshrc, no nvm, no fnm. Which means node, mvn, and java aren’t on PATH.

Claude would try to run npm test and fail because node wasn’t found. Not because of a permission issue (the binary was allowed), but because the sandboxed shell simply didn’t know where it was.

The fix: resolve tool paths on the host before launching Claude, then prepend a PATH export hint to the prompt:

1# Resolve paths outside the sandbox
2NODE_PATH=$(which node)
3MVN_PATH=$(which mvn)
4
5# Tell Claude to set PATH before running anything
6"Before running any tools, execute: export PATH=/usr/local/bin/nvm/versions/node/v22/bin:..."

This worked but felt fragile. Every new tool needed manual PATH resolution. And it only worked for tools we anticipated.

The Settings Maze

Then came the next discovery: in our setup, project-level sandbox settings (.claude/settings.json in the repo) didn’t apply to headless claude -p invocations. We had to fall back to global settings at ~/.claude/settings.json. The Claude Code settings docs describe the settings precedence model, but this headless-mode behavior isn’t explicitly documented, it’s something we learned the hard way. (The CLI does offer a --setting-sources flag that may help control this.)

This meant configuring write paths for build tool caches (~/.m2, ~/.npm), temp directories (/tmp, /private/tmp), and network access for package registries, all in global settings that affect every Claude session on your machine, not just the pipeline.

We tried propagating sandbox network configs from the automation project settings into each worktree’s settings file. We wrote a ensure_worktree_sandbox_network helper that copied allowed domains and write paths. This kind of worked, but it was complex, error-prone, and didn’t cover all edge cases.

The Documentation Smell

At some point we had written extensive sandbox best practices documentation. That’s usually a sign that something is too complicated. When you need a guide explaining how to diagnose permission errors, which directories need write access for Maven vs npm, and why global settings are required instead of project-level ones, the solution has too much surface area.

The Turning Point

After a couple of weeks we’d accumulated enough sandbox pain points:

  • PATH resolution was fragile and incomplete
  • Global settings polluted the host configuration
  • Different build tools needed different permission sets
  • Sandbox errors were hard to diagnose (generic “permission denied” with no hint about which sandbox rule triggered)
  • The configuration couldn’t be checked into the repo, it lived in ~/.claude/settings.json on each developer’s machine

We needed a different approach. Anthropic recommends running --dangerously-skip-permissions in a sandboxed environment and provides a reference devcontainer for Docker-based isolation, so we took that advice seriously.

Three Isolation Modes

We introduced configurable isolation modes:

1isolation:
2  mode: "docker"  # "docker" | "sandbox" | "none"

The sandbox mode still existed for those who wanted it. But docker became the new default recommendation. And none was there for trusted environments (with a persistent warning banner on the dashboard).

The sandbox code got extracted into sandbox-hints.sh, only sourced when ISOLATION_MODE=sandbox. No more prompt bloat or PATH hacks in Docker mode.

Building the Claude Docker Image

Anthropic provides a reference devcontainer setup for running Claude Code in Docker. We took a similar approach but tailored it for headless pipeline use, a purpose-built image with our specific toolchain:

1FROM node:22-slim
2RUN apt-get update && apt-get install -y git jq openjdk-25-jdk maven
3RUN npm install -g @anthropic-ai/claude-code

All the tools Claude needs (Node.js, Java, Maven, git) pre-installed on PATH. No sandbox configuration. No PATH resolution hacks. No global settings. It just works.

The Root User Surprise

First Docker run: crash. Claude Code refuses --dangerously-skip-permissions when running as root (source ). This is an undocumented security check in the CLI itself: root + skip-permissions is too dangerous even for headless mode. (There’s also an IS_SANDBOX=1 environment variable that bypasses this check, but we opted for a non-root user instead.)

Fix: create a non-root user in the Dockerfile:

1RUN useradd -m -s /bin/bash claude
2USER claude

We also disabled the sandbox inside the container, since Docker itself serves as the isolation boundary:

1{ "sandbox": { "enabled": false } }

Note: Anthropic’s reference devcontainer takes a different approach: it keeps the sandbox active and adds a firewall layer on top. We chose to disable it because the sandbox’s OS-level restrictions (Seatbelt/bubblewrap) can conflict with containerized environments, and Docker already provides the isolation we need.

The Native Module Crash

Second surprise: the dashboard container crashed on startup with “invalid ELF header.” We were copying node_modules/ from macOS into the Linux container, and native modules (like better-sqlite3) compiled for Darwin don’t work on Linux.

Fix: .dockerignore to exclude node_modules/ and npm install inside the container.

Container Reuse

Initially, every Claude invocation got a fresh docker run --rm container. This meant:

  • Dependencies re-downloaded every step (no npm/Maven cache)
  • Container startup overhead per invocation
  • No state preserved between plan → implement → verify

The breakthrough was persistent containers: the first worktree step creates a named container with sleep infinity, and subsequent steps docker exec into it:

1# First step: create container
2docker run -d --name automation-issue-42 automation-claude:latest sleep infinity
3docker exec automation-issue-42 claude -p "Plan this ticket..."
4
5# Later steps: reuse
6docker exec automation-issue-42 claude -p "Implement the plan..."

Container names are deterministic, derived from the ticket’s change name (e.g., automation-issue-42-add-feature). Cleanup happens on dismiss, restart, archive, and interrupt. A recovery routine on dashboard startup finds and removes orphaned containers.

Credential Forwarding

Claude needs authentication, either an API key or OAuth credentials (for Claude Max subscriptions). In Docker mode, we need to get those into the container.

We wrote a credential resolver that checks three sources in order:

  1. ANTHROPIC_API_KEY environment variable
  2. macOS Keychain (security find-generic-password)
  3. ~/.claude/.credentials.json (Linux)

For ephemeral containers (assessment), credentials are passed per-run. For persistent containers, they’re written to a stable file path in the worktree and refreshed before each docker exec.

The Final Architecture

Here’s what we landed on:

Dashboard (Next.js) → API route → bash script → claude-stream.sh
                                                      ↓
                                            ┌─────────┴──────────┐
                                            │  ISOLATION_MODE?   │
                                            └─────────┬──────────┘
                                    ┌─────────────────┼────────────────┐
                                    ↓                 ↓                ↓
                                 docker            sandbox           none
                                    ↓                 ↓                ↓
                          docker exec/run    seatbelt + PATH    claude -p directly
                          into container      hints to prompt

Docker mode is the sweet spot:

  • Tools pre-installed in the image, no PATH gymnastics
  • Container boundary is the isolation, no sandbox config needed
  • Persistent containers preserve state across pipeline stages
  • Cache mounts (~/.m2, ~/.npm) avoid redundant downloads
  • Non-root user satisfies Claude CLI’s security policy
  • Orphaned container cleanup on startup prevents resource leaks

Sandbox mode still works for those who prefer OS-level isolation, with all the conditional PATH/prompt machinery in a separate script.

No isolation works for trusted environments, with the dashboard showing a warning so you don’t forget.

The Dashboard Experience

The Claudomator dashboard isn’t just a status page, it’s a control center. It shows:

  • Real-time streaming logs with terminal-style headers, JSON pretty-printing, and stage dividers
  • Ticket states flowing through the pipeline: queued → assessing → assessed → planning → planned → implementing → implemented → archived
  • Assessment, plan, and implementation summaries rendered as markdown
  • Action controls at every stage: Plan, Implement, Instruct, Retry, Restart, Interrupt, Dismiss, Archive
  • Isolation health warnings when no effective isolation is detected
  • Multi-project support with a project selector dropdown

The pipeline is self-healing: if Claude produces no changes, it retries automatically (the “Ralph loop”). If a process dies, recovery detects it on startup and parses the last output to determine the correct state.

What We Learned

1. Sandbox is powerful but high-friction for headless use. The sandbox is designed for interactive sessions where a human can troubleshoot permission errors. In headless mode, you need everything configured perfectly upfront, no room for “Allow once” clicks.

2. Docker is the natural isolation boundary for headless AI. You define the environment once in a Dockerfile, and every invocation gets the same predictable setup. No per-machine configuration, no PATH hacks, no global settings pollution. This is consistent with Anthropic’s best practices to run --dangerously-skip-permissions in a sandboxed environment, and the devcontainer reference that demonstrates Docker-based isolation.

3. Container reuse matters for multi-step pipelines. Fresh containers per invocation waste time re-downloading dependencies and lose cross-step state. Persistent containers with docker exec give you the best of both worlds.

4. The CLI has opinions. Claude Code refusing --dangerously-skip-permissions as root (source ) is a good security decision, but it’s the kind of thing you only discover at runtime. Read the CLI reference , and expect surprises.

5. Start simple, add isolation later. Our first version had zero isolation and it was fine for development. We added sandbox when we wanted safety, then migrated to Docker when sandbox friction became too high. Each step was informed by real pain points, not theoretical concerns.

References