← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-reflection-cap

Reflection-loop depth limit — a ceiling on how often an agent reworks its own answer

An AI agent can review and rewrite its own answer to improve it. If that review runs too long it ties up resources and stops the agent responding in time, and an attacker can deliberately trigger those endless cycles to stall the system. A reflection-loop depth limit prevents that: it sets how many review rounds an agent may run before it has to stop.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

node

Restricted to node kinds: agent

COVERAGE

2 threats

T4 · T5

TRADE-OFFS

LAT

low

COST

low

DEV

low

Latency · cost · UX friction · dev effort.

TL;DR

It sets a limit on how many times an agent can review and rewrite its own answer before it has to stop.
Without that limit, an agent can get stuck looping. It repeats itself, runs up compute and cost, and sometimes makes the answer worse each round. An attacker can trigger this on purpose.
Every major agent framework already has a built-in setting for it. The only real work is choosing a sensible limit for each kind of task.

How it behaves

The agent picks up a task and starts reviewing and rewriting its own answer.

Each time round the loop, the system checks how many rounds it has done against the limit set for this kind of task.

The agent keeps going until it is satisfied with its answer or reaches the limit.

The loop is stopped. The agent hands back its best answer so far (or a clear refusal), and the event is logged for review.

The limit counts rounds, not answer quality. A well-built agent should normally finish on its own well before reaching it.

What it is

An agent is, at heart, an LLM running in a loop: it generates an answer, evaluates it, and feeds the result back in to revise. A reflection, or self-critique, loop is exactly that cycle. The looping is what makes an agent useful, but if it never stops, it becomes a problem. A depth limit is the fix. It sets the maximum number of iterations an agent may run on a single task: more for open-ended work like research or planning, as few as one for a simple query. When the agent reaches the limit, it stops and returns its best answer so far, or a structured refusal.

Detection signals

Rate of tasks that reach the limit. A rising rate means loops are being driven to it, usually from a prompt injection, an unusually hard task, or broken stop logic.
Cost or tokens per task. A task costing far more than normal means a loop ran on a code path where the limit was never applied.

Threats it covers

T4 Resource Overload −1 severity step

WHY IT HELPS Resource Overload is the deliberate exhaustion of an agent's compute, memory, or budget. An unbounded reflection loop is one route to it, with the agent consuming resources by repeatedly reprocessing its own output, and a depth limit closes that route.
T5 Cascading Hallucination Attacks −1 severity step

WHY IT HELPS Cascading Hallucination Attacks turn one false output into many, as it gets embedded and then treated as fact by later steps. Every extra reflection round is another chance for that error to compound, so limiting the number of rounds restricts how far it spreads.

Principle coverage

Defence-in-Depth stage: Prevent — and it advances:

Least Agency / Minimal Autonomy A depth limit removes the agent's unbounded freedom to keep re-deciding, holding self-revision to a budget you set rather than to the agent's own judgment.
Rate-limiting / Budgets / Loop prevention The iteration limit is a hard ceiling on consumption for reflection loops, bounding how much compute a single task may spend.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Use whichever fits the framework you already run, and prefer its built-in setting over writing your own wrapper.

LangChain The built-in option if you already use LangChain. It limits how many times the agent loops.

Why choose it: Set max_iterations=N on AgentExecutor (default 15), and pair it with max_execution_time as a wall-clock backstop. No wrapper code, since it is built in. Lower the default for simpler tasks.

More details:

AgentExecutor API reference ↗

AutoGen The built-in option for AutoGen multi-agent pipelines. It limits how many replies an agent makes in a row.

Why choose it: Set max_consecutive_auto_reply=N on the agent (class default 100). It counts conversation replies, not individual tool calls, and behaviour varies by human_input_mode.

More details:

ConversableAgent reference ↗

CrewAI The built-in option for CrewAI. It limits each agent's reasoning loops on a task.

Why choose it: Set max_iter=N on the Agent (default 20). It controls per-task reasoning loops, not the turns between agents. A good fit for crew pipelines where each agent has its own role.

More details:

CrewAI Agents documentation ↗

Reflexion A build-it-yourself pattern from the Reflexion paper, for loops you wrote without a framework.

Why choose it: Limit a trial counter to N (the paper uses up to ~12 for AlfWorld, with memory bounded to 1–3 stored episodes); each trial appends a written reflection to context. The reference code sets it with a num_trials variable in run_reflexion.sh.

More details:

Self-Refine A build-it-yourself pattern from the Self-Refine paper, for loops you wrote without a framework.

Why choose it: Set a hard ceiling of N refinement rounds (the paper uses max 4) and stop early when is_refinement_sufficient() returns true. The four-round limit is documented in the paper and the project on GitHub.

More details:

Trade-offs

Set the limit too low and the agent stops before reaching a good answer on genuinely hard tasks.
Set it too high and a runaway loop can still consume compute up to that limit before it stops.
Enabling it is trivial, since every framework exposes the setting. The effort is calibration: choosing the right value per task type takes roughly two weeks of production monitoring.

When NOT to use

Agents that answer in one pass and never loop. There is nothing to limit.
As a substitute for good stopping logic. If the limit is reached on most tasks, the agent's own termination rule is broken, and that is what to fix.
When the cost you care about is the tokens used in a single call rather than the number of loops. Limit token use per call separately for that.

Limitations

The limit counts iterations; it cannot judge the quality of the answer.
An adaptive attack that produces fresh output on each iteration will run until it reaches the limit.
Choosing the right value is harder than adding the limit. Teams who say "we have a reflection limit" but never tuned it per task type often have a false sense of safety.

Maturity tier reasoning

Tier 2 fits because the control is universally available: every major agent framework ships an iteration limit as a built-in setting.
Not Tier 1, because there is no canonical value. Its effectiveness depends entirely on per-task calibration that no standard prescribes.
Not Tier 3, because it is production-ready today and requires no novel engineering.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

node

Valid node kinds: agent

Place it on the canvas →

MAESTRO LAYERS

L3 L5

ATLAS TECHNIQUES

AML.T0029 Denial of AI Service
Adversary exhausts compute, memory, or rate-limit budgets so the AI system stops responding or stops processing legitimate requests.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0004 Restrict Number of AI Model Queries
Limit query rate / volume to prevent model extraction, optimization attacks, and denial of AI service.
AML.M0022 Generative AI Model Alignment
Train or fine-tune the model so its outputs align with intended behaviour; reduces the residual surface of jailbreak / misalignment attacks.

TRADE-OFFS

latency low
cost low
ux friction low
dev effort low

PLAYBOOKS

2 OWASP v1.1 playbooks recommend this control: