TL;DR: Agentro runs up to 20 AI coding agents in parallel on the same codebase with zero conflicts. The architecture is a 4-layer stack: Git worktrees for branch isolation, ephemeral Docker containers for process isolation, Celery task queue for lifecycle management, and WebSocket streaming for real-time observability. Agent startup takes under 3 seconds.

The challenge of parallel agents

Running one AI coding agent is straightforward. Running twenty at the same time on the same codebase — without merge conflicts, race conditions, or corrupted state — is an entirely different problem. When we set out to build Agentro's execution engine, this was the core constraint: every agent must operate in complete isolation while still working against the same repository.

Most existing tools sidestep this problem. They run one agent at a time, or they require manual branch management, or they simply hope for the best. We wanted to build something that treated parallelism as a first-class concern — not an afterthought.

The 4-layer architecture

Agentro's execution engine is built on four layers, each solving a specific isolation or orchestration problem.

Layer 1: Git worktrees for branch isolation

The key insight was Git worktrees. Instead of cloning the repository once and hoping for the best, each agent session gets its own worktree — a lightweight, independent working directory backed by the same Git object store. This means each agent has its own branch, its own file state, and its own staging area. No locks, no contention, no conflicts during execution.

Worktree creation is effectively instant. Because Git worktrees share the object store with the main clone, creating a new worktree doesn't copy any files from the network. It's a local filesystem operation — zero-copy worktree creation via Git — that completes in milliseconds, even for repositories with hundreds of thousands of files.

When an agent finishes its work, the result is a clean branch with well-scoped commits. Merging happens through the normal merge request flow, where humans review the output before it lands on the main branch.

Layer 2: Ephemeral Docker containers for process isolation

Each agent session runs inside an ephemeral Docker container. The container is provisioned on demand, pre-loaded with the worktree, and destroyed when the session ends. This gives us hard process-level isolation — one agent cannot see or interfere with another agent's filesystem, environment, or network.

We chose Docker over heavier virtualization for startup speed. A new agent session goes from queued to running in under three seconds. For teams running dozens of agents daily, that latency matters.

Each container runs with explicit CPU and memory limits. A runaway agent that enters an infinite loop or tries to allocate 32 GB of RAM gets killed cleanly without affecting other agents. The resource limits are configurable per workspace — Enterprise customers can allocate more resources to high-priority sessions.

Layer 3: Celery task queue for lifecycle management

Behind the scenes, every agent session is a Celery task. The task queue handles scheduling, retry logic, concurrency limits, and state transitions. Each session follows a strict state machine:

queued — task is waiting for a worker slot
running — agent is actively executing
paused — user paused execution (agent state is preserved)
awaiting_review — agent has finished and is waiting for human review
completed — task finished successfully
failed — task encountered an unrecoverable error
cancelled — user cancelled the task

An advisory lock ensures that no two workers can transition the same session simultaneously. This prevents the classic race condition where two workers both see a session as "queued" and try to start it at the same time.

Celery's priority queues also let us implement fair scheduling across workspaces. Enterprise customers can burst to higher concurrency without starving other tenants. The Beat scheduler (embedded in the worker via the -B flag) handles periodic health checks, session timeouts, and cleanup of orphaned containers.

Layer 4: Real-time streaming via WebSocket

Developers want to watch their agents work in real time. We stream container logs directly over WebSocket connections — no polling, no Redis pub/sub intermediary. The FastAPI backend attaches to the Docker container's log stream and forwards events to connected clients with sub-second latency.

This architecture keeps the real-time layer simple and stateless. If a WebSocket disconnects, the client reconnects and picks up the stream from where it left off. The agent keeps running regardless — the observation layer is completely decoupled from the execution layer.

What happens when an agent fails

Failure handling was one of the most important design decisions in the execution engine. Agents fail for many reasons: the model produces invalid code, a dependency isn't available in the container, the model gets stuck in a loop, or the task is simply too ambiguous.

Here's how Agentro handles each case:

Model errors and invalid output. If the agent produces code that doesn't compile or fails tests, the session transitions to failed with the error output preserved. The developer can review the logs, adjust the spec, and retry from a clean state.

Container resource limits. If a container exceeds its CPU or memory allocation, Docker kills the process. The Celery worker detects the OOM event, transitions the session to failed, and logs the resource usage. No other sessions are affected.

Session timeouts. Every session has a configurable maximum runtime. If an agent runs for longer than the limit (default: 30 minutes), the worker sends a graceful shutdown signal, waits 10 seconds, and then force-kills the container. The session transitions to failed with a timeout reason.

Orphaned containers. The Beat scheduler runs a periodic cleanup task that identifies containers whose corresponding Celery task no longer exists. This handles edge cases like worker crashes or network partitions. Orphaned containers are stopped and removed automatically.

Network failures. If the connection between the worker and the Docker daemon drops, the worker retries the connection with exponential backoff. If the connection can't be re-established within 60 seconds, the session transitions to failed and the container is marked for cleanup.

Performance characteristics

We've optimized the execution engine for the metrics that matter most to engineering teams:

Agent startup time: < 3 seconds from queued to running, including worktree creation, container provisioning, and agent initialization.
Zero-copy worktree creation via Git's shared object store — no network I/O, no file copying.
Resource-isolated containers with configurable CPU and memory limits — one misbehaving agent can't affect others.
Linear scaling — adding more agents is a matter of adding more compute. The architecture has no central bottleneck.
Sub-second log streaming — developers see agent output in real time over WebSocket, with automatic reconnection on disconnect.

What we learned

Building for parallelism forced us to think carefully about resource isolation, state management, and failure modes from day one. A few lessons that might be useful if you're building something similar:

Treat the state machine as the source of truth. Every possible transition must be explicitly defined. If a state transition isn't in the diagram, it shouldn't be possible in code.
Use advisory locks, not optimistic locking. When two workers race to claim the same session, you want one to win and one to wait — not both to proceed and then one to fail on save.
Design for failure from the start. Every network call, every Docker API invocation, every database query can fail. The question isn't "will it fail?" but "what happens when it fails?"
Decouple observation from execution. The fact that a developer is watching an agent should have zero impact on the agent's behavior. WebSocket disconnects should never cause agent failures.

The combination of Git worktrees, ephemeral containers, and a task queue gives us a foundation that scales linearly — adding more agents is just a matter of adding more compute, not rearchitecting the system.

Try parallel agent execution today — Start Free →

How We Built Agentro's Parallel Agent Execution Engine