MITIGATION · m-reflection-cap
Reflection-loop depth limit — a ceiling on how often an agent reworks its own answer
An AI agent can review and rewrite its own answer to improve it. If that review runs too long it ties up resources and stops the agent responding in time, and an attacker can deliberately trigger those endless cycles to stall the system. A reflection-loop depth limit prevents that: it sets how many review rounds an agent may run before it has to stop.
At a glance
TL;DR
- It sets a limit on how many times an agent can review and rewrite its own answer before it has to stop.
- Without that limit, an agent can get stuck looping. It repeats itself, runs up compute and cost, and sometimes makes the answer worse each round. An attacker can trigger this on purpose.
- Every major agent framework already has a built-in setting for it. The only real work is choosing a sensible limit for each kind of task.
How it behaves
What it is
An agent is, at heart, an LLM running in a loop: it generates an answer, evaluates it, and feeds the result back in to revise. A reflection, or self-critique, loop is exactly that cycle. The looping is what makes an agent useful, but if it never stops, it becomes a problem. A depth limit is the fix. It sets the maximum number of iterations an agent may run on a single task: more for open-ended work like research or planning, as few as one for a simple query. When the agent reaches the limit, it stops and returns its best answer so far, or a structured refusal.
Detection signals
- Rate of tasks that reach the limit. A rising rate means loops are being driven to it, usually from a prompt injection, an unusually hard task, or broken stop logic.
- Cost or tokens per task. A task costing far more than normal means a loop ran on a code path where the limit was never applied.
Threats it covers
-
WHY IT HELPS Resource Overload is the deliberate exhaustion of an agent's compute, memory, or budget. An unbounded reflection loop is one route to it, with the agent consuming resources by repeatedly reprocessing its own output, and a depth limit closes that route.
-
WHY IT HELPS Cascading Hallucination Attacks turn one false output into many, as it gets embedded and then treated as fact by later steps. Every extra reflection round is another chance for that error to compound, so limiting the number of rounds restricts how far it spreads.
Principle coverage
Defence-in-Depth stage: Prevent — and it advances:
- Least Agency / Minimal Autonomy A depth limit removes the agent's unbounded freedom to keep re-deciding, holding self-revision to a budget you set rather than to the agent's own judgment.
- Rate-limiting / Budgets / Loop prevention The iteration limit is a hard ceiling on consumption for reflection loops, bounding how much compute a single task may spend.
Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.
Implementation options
Use whichever fits the framework you already run, and prefer its built-in setting over writing your own wrapper.
LangChain The built-in option if you already use LangChain. It limits how many times the agent loops.
Why choose it: Set max_iterations=N on AgentExecutor (default 15), and pair it with max_execution_time as a wall-clock backstop. No wrapper code, since it is built in. Lower the default for simpler tasks.
More details:
AutoGen The built-in option for AutoGen multi-agent pipelines. It limits how many replies an agent makes in a row.
Why choose it: Set max_consecutive_auto_reply=N on the agent (class default 100). It counts conversation replies, not individual tool calls, and behaviour varies by human_input_mode.
More details:
CrewAI The built-in option for CrewAI. It limits each agent's reasoning loops on a task.
Why choose it: Set max_iter=N on the Agent (default 20). It controls per-task reasoning loops, not the turns between agents. A good fit for crew pipelines where each agent has its own role.
More details:
Reflexion A build-it-yourself pattern from the Reflexion paper, for loops you wrote without a framework.
Why choose it: Limit a trial counter to N (the paper uses up to ~12 for AlfWorld, with memory bounded to 1–3 stored episodes); each trial appends a written reflection to context. The reference code sets it with a num_trials variable in run_reflexion.sh.
More details:
Self-Refine A build-it-yourself pattern from the Self-Refine paper, for loops you wrote without a framework.
Why choose it: Set a hard ceiling of N refinement rounds (the paper uses max 4) and stop early when is_refinement_sufficient() returns true. The four-round limit is documented in the paper and the project on GitHub.
More details:
Trade-offs
- Set the limit too low and the agent stops before reaching a good answer on genuinely hard tasks.
- Set it too high and a runaway loop can still consume compute up to that limit before it stops.
- Enabling it is trivial, since every framework exposes the setting. The effort is calibration: choosing the right value per task type takes roughly two weeks of production monitoring.
When NOT to use
- Agents that answer in one pass and never loop. There is nothing to limit.
- As a substitute for good stopping logic. If the limit is reached on most tasks, the agent's own termination rule is broken, and that is what to fix.
- When the cost you care about is the tokens used in a single call rather than the number of loops. Limit token use per call separately for that.
Limitations
- The limit counts iterations; it cannot judge the quality of the answer.
- An adaptive attack that produces fresh output on each iteration will run until it reaches the limit.
- Choosing the right value is harder than adding the limit. Teams who say "we have a reflection limit" but never tuned it per task type often have a false sense of safety.
Maturity tier reasoning
- Tier 2 fits because the control is universally available: every major agent framework ships an iteration limit as a built-in setting.
- Not Tier 1, because there is no canonical value. Its effectiveness depends entirely on per-task calibration that no standard prescribes.
- Not Tier 3, because it is production-ready today and requires no novel engineering.
Last verified against upstream docs: 2026-05-30.