Blog
Best Practices

Speccing for AI: How to Write Tasks Agents Can Actually Execute

Agentro Team|2026-03-15|9 min read

TL;DR: AI agents are literal executors — they build exactly what you describe, not what you meant. Five rules produce consistently better results: state the outcome (not the process), define the boundary, include acceptance criteria, provide context (not instructions), and assign one task per agent. Teams that follow these rules see dramatically fewer rework cycles.

Why specs matter more than ever

When a human developer picks up a vaguely written ticket, they fill in the gaps with context, intuition, and a quick Slack message to the author. AI agents don't have that luxury. They take your words at face value and execute accordingly. A spec that would be "good enough" for a human teammate will produce unpredictable results when handed to an agent.

This isn't a limitation of current models — it's a fundamental property of how AI agents work. Even the most capable models interpret ambiguity by making assumptions. Sometimes those assumptions align with your intent. Often they don't. The spec is your only lever for controlling the output.

After watching hundreds of agent sessions across our beta, we've distilled five rules that consistently produce better outcomes. Each rule addresses a specific failure mode we observed in real sessions.

Rule 1: State the outcome, not the process

The failure mode: You describe how to do something, the agent follows your steps literally even when a step doesn't apply, and the result is broken code that technically followed instructions.

Before (without this rule):

"Refactor the authentication module to use a new pattern."

The agent interprets "new pattern" as whatever it considers modern — maybe it switches from session-based auth to JWT, changes the database schema, updates 14 files, and breaks three integration tests. It did refactor the auth module. It did use a new pattern. But it wasn't the pattern you wanted.

After (applying this rule):

"Replace password hashing from bcrypt to argon2id in the auth module. All existing tests must still pass. No changes to the public API."

The agent knows the exact end state. It can verify its own work against the criteria. The result is a focused MR that changes the hashing algorithm and nothing else.

Why it works: Agents are optimizers. When you give them a clear target, they find efficient paths to reach it. When you give them vague process instructions, they follow the process — even when the process leads somewhere wrong.

Rule 2: Define the boundary

The failure mode: The agent "helpfully" modifies files outside the intended scope — renaming variables for consistency, updating imports across the codebase, or refactoring adjacent modules.

Before (without this rule):

"Add input validation to the user registration endpoint."

The agent adds validation to the registration endpoint, but also notices that the login endpoint has similar validation gaps. It updates both endpoints, refactors the shared validation utilities, and renames a few variables for consistency. The MR touches 11 files across 4 directories. The reviewer now has to verify that all 11 changes are correct, even though only 2 were requested.

After (applying this rule):

"Add input validation to the user registration endpoint in src/api/auth/register.ts. Only modify this file and its corresponding test file src/api/auth/__tests__/register.test.ts. Do not change any other files."

The agent produces a 2-file MR that's trivial to review. The scope is explicit. There are no surprises.

Why it works: Boundary definitions counteract the agent's tendency to optimize for "correctness" at the expense of reviewability. A well-scoped MR is more valuable than a "comprehensive" one.

Rule 3: Include acceptance criteria

The failure mode: The agent finishes and reports success, but "success" means "the code compiles" — not "the feature works correctly."

Before (without this rule):

"Build an API endpoint that returns a list of active users."

The agent builds the endpoint. It compiles. It returns data. But it returns all users, not just active ones, because the agent's definition of "active" differs from yours. Is "active" based on a boolean flag? Last login date? Account status? The agent guessed, and it guessed wrong.

After (applying this rule):

"Build a GET endpoint at /api/users/active that returns users where status = 'active' and last_login > now() - interval '90 days'. Response format: { users: [{ id, email, name, lastLogin }] }. Must include pagination (default 20 per page). All existing tests must pass. Add a new test file with at least 3 test cases: normal response, empty result, and pagination."

The acceptance criteria define exactly what "done" looks like. The agent can self-verify against each criterion. The reviewer can check each criterion in the MR.

Why it works: Acceptance criteria transform a vague request into a testable contract. They eliminate the gap between "what you meant" and "what the agent understood."

Rule 4: Provide context, not instructions

The failure mode: You give step-by-step instructions that were correct when you wrote them, but the agent follows them rigidly — even when the current state of the code makes some steps unnecessary or harmful.

Before (without this rule):

"Step 1: Create a new migration file. Step 2: Add a preferences column to the users table. Step 3: Create a new model class. Step 4: Add a new API endpoint. Step 5: Update the frontend form."

The agent follows each step in order. But step 3 is unnecessary because the existing ORM auto-generates model classes from migrations. The agent creates a duplicate model class that conflicts with the auto-generated one. Now the build is broken, and the developer has to figure out which model class to keep.

After (applying this rule):

"Add a user preferences feature. Users should be able to set timezone and notification preferences. The codebase uses SQLAlchemy with auto-generated models from migrations (see src/db/models/). The frontend uses React Hook Form (see src/components/forms/). The existing user profile page is at src/pages/profile.tsx. Performance requirement: preferences load in under 200ms."

The agent has the context to make good decisions. It examines the existing patterns, sees the auto-generated models, and skips the redundant step. The result fits naturally into the existing codebase.

Why it works: Context enables intelligent decision-making. Instructions enforce rigid execution. Agents with context adapt to the actual state of the code. Agents with instructions follow the plan regardless.

Rule 5: One task, one agent

The failure mode: A multi-part task gets assigned to one agent. The agent completes part 1 correctly, makes a mistake in part 2, and now the MR is a tangled mix of good and bad changes that's painful to review.

Before (without this rule):

"Add the new payment endpoint, update the admin dashboard to show payment stats, and fix the CSS alignment bug on the settings page."

The agent produces a 47-file MR that spans three unrelated features. The payment endpoint is correct. The dashboard changes have a bug. The CSS fix accidentally breaks another page. The reviewer can't approve the payment changes without also approving the broken dashboard code. The entire MR gets sent back for revisions.

After (applying this rule):

Three separate agent sessions:

  1. "Add the new payment endpoint in src/api/payments/..."
  2. "Update the admin dashboard to show payment stats in src/pages/admin/..."
  3. "Fix the CSS alignment bug on the settings page in src/pages/settings/..."

Each produces an independent MR. The payment MR gets approved immediately. The dashboard MR gets one comment and a quick fix. The CSS MR gets approved. Total review time is less than reviewing the monolithic MR, and the good changes ship without waiting for the broken ones.

Why it works: Small, focused MRs are faster to review, easier to approve, and simpler to revert. Agentro makes this easy — spinning up multiple parallel agents is the entire point.

Spec Template

Here's a template you can use for any agent task. Copy it, fill in the sections, and hand it to your agent.

Goal: [What should be true when this is done]
Boundary: [Files to touch / files to leave alone]
Acceptance Criteria: [How to verify it works]
Context: [Constraints, performance requirements, dependencies]
Non-goals: [What this task should NOT do]

Example, filled in:

Goal: Add rate limiting to the public API endpoints.
Boundary: Only modify files in src/middleware/ and src/api/routes/.
  Do not change authentication logic or database schemas.
Acceptance Criteria:
  - Rate limit of 100 requests per minute per API key
  - Returns 429 with Retry-After header when limit exceeded
  - Rate limit state stored in Redis (existing connection in src/lib/redis.ts)
  - Add tests covering: normal request, rate limited request, limit reset
Context: The API uses Express.js with middleware chain pattern.
  See src/middleware/auth.ts for an example middleware.
  Redis is already configured and available via src/lib/redis.ts.
  Current traffic: ~50 requests/second peak.
Non-goals: Do not implement per-endpoint rate limits (that's a future task).
  Do not add rate limiting to internal/admin endpoints.

This template maps directly to how Agentro's Idea Mode structures its output. If you use Idea Mode, you get this format automatically after the conversational phase. If you prefer to write specs manually, this template gives you the same structure.

Putting it all together

These five rules aren't theoretical. They emerged from watching hundreds of real agent sessions and identifying the patterns that separate successful sessions from frustrating ones. Teams that adopt these rules consistently report:

  • Fewer review cycles per MR
  • Higher confidence in delegating complex tasks to agents
  • Faster time-to-merge for agent-generated code

The underlying principle is simple: invest two minutes in spec quality to save two hours in review time. AI agents are powerful tools, but they amplify the quality of their input. A great spec produces great code. A vague spec produces code that technically matches the words you wrote — but not the feature you wanted.

Agentro's Idea Mode automates the hardest part of spec writing — the part where you figure out what you forgot to specify. Try it today — Start Free →

Ready to try Agentro?

Start with Idea Mode today. No credit card required.