Agent Workflow Patterns Explained Visually
When NOT to Use an Agent
What is the workflow-vs-agent decision?
It is the question you answer before writing a single line of an agent loop: does this task need a model that picks its own next step, or just a sequence of model-shaped boxes wired together in code? Anthropic's Building effective agents is blunt about the default — "add multi-step agentic systems only when simpler solutions fall short." A workflow is a path you draw in advance; an agent is a loop that lets the model choose the path turn by turn. Most "agent" problems are workflows in disguise.
Four questions to ask first
Run any candidate task through these four checks. The more "yes" answers you get, the more strongly the task belongs in a workflow.
- Is the step count known up front? If you can draw the DAG on a whiteboard, you don't need a loop to discover it.
- Same input → same path every time? Predictable branching is a
switchstatement, not a planner. - Are the tools well-defined and bounded? A model that has to invent the tool sequence is doing agent work; a model that picks from
{lookup, refund, escalate}is doing routing. - Is the cost of a wrong action high? High-stakes workflows want named branches with retries and human review at the seams — not a model deciding mid-run.
A "no" on one question is usually fine. Two or more "no"s and the task has earned a real agent loop.
What does an agent actually cost?
Loops are not free. From the data behind Anthropic's multi-agent research system post: agents use roughly 4× the tokens of a single chat turn, and multi-agent systems hit ~15× — every tick rebuilds the prompt from accumulated state until the model decides it is done. On top of the bill, an agent is harder to debug (you replay a loop trace) and harder to make reliable (failure modes are emergent). A workflow fails in a named place — the classifier returned an unknown label, the lookup hit a 404 — and named places are where you put metrics, retries, and humans.
You've decided workflow — now choose which one
Workflows vs Agents in Module 1 drew the binary from the loop side. This step is the gate into the workflow lane itself: the four questions above sort tasks into "workflow" or "agent," and the rest of this module — chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer — is the menu of which workflow shape to reach for once you're inside the lane. The loop-shaped agent comes later in the track.
Try it: Click each task in the right pane and watch the four questions decide its fate. The counter at the top accumulates as you go — by the time you've scored a few, the asymmetry should be obvious. Most production work lands in Workflow. Reach for an agent only when the path itself is the unknown.
Chaining + Routing
What do "chaining" and "routing" actually mean?
Both names come from Anthropic's Building effective agents post. Chaining is a fixed sequence of model calls — the output of step 1 feeds step 2, which feeds step 3. Routing is a single classify-then-dispatch step — one model call picks one of N downstream paths, and the rest of the program is a switch statement. Neither pattern lets the model decide what to do next at runtime. The shape of the program is drawn in advance, in code; the LLM just fills in the boxes. That's why both belong in the workflow lane, not the agent lane.
Chaining — when each step has one obvious successor
Reach for a chain when the work decomposes into a known number of stages and each stage takes the previous stage's output as-is. The left pane runs a three-step chain — translate EN → KO, summarize, then rewrite the summary in a casual tone. Three model calls, no branching, same path on every input.
The failure mode worth knowing: if any link breaks, the rest of the chain runs on garbage. A bad translation produces a clean summary of the wrong sentence. Production chains usually add a validation gate between steps (regex, JSON-schema check, or a cheap LLM-judge call) — the simulation here keeps the pipeline ungated so the fixed-shape contrast with routing stays clean.
Routing — when the input space is one but the handling diverges
Reach for a router when every request enters through one inbox but the right next move depends on what kind of request it is. The right pane runs a support-ticket triage with five destinations — billing, technical, sales, account, escalate-human. Each ticket is classified once, then handed to the matching downstream handler.
Here is the design point that trips people up: the classifier is itself a model call. So routing is "an LLM deciding the path" — but only at one point. The downstream branches are still ordinary code. The freedom is bounded to picking a label from a known set, not inventing the next action.
Try it: Click a ticket in the right pane and watch the classifier light up exactly one branch. The chain pane has no choices — same input, same path, every time. That contrast is the whole stage.
Steps 3-5 cover three more named patterns from the same Anthropic post: parallelization, orchestrator-workers, and evaluator-optimizer. All still workflows. The agent loop comes later in the track.
Parallelization + Voting
Two ways to use parallel calls
Anthropic's Building effective agents post groups two distinct patterns under "parallelization." Sectioning splits one job into independent subparts and runs them at the same time, then aggregates. Voting runs the same job N times and aggregates the answers — usually majority vote. Both are still workflows: the shape is fixed in code, the model is just called more than once at the same instant.
Sectioning — when subtasks are genuinely independent
The test is whether one subtask needs another's output. If summarizing the results section requires having read method first, that's a chain. If each section stands alone and you stitch them at the end, you have parallelizable work.
The payoff is the part most teams underestimate: total compute is roughly unchanged — five 1-second calls cost five model-seconds either way — but wall-clock latency drops to the longest single call plus a small aggregator. Five sections in parallel finish in ~1.2s instead of 5.0s. The rate-limiter is no longer the sum, it's the slowest worker.
Voting — when accuracy is the bottleneck
Voting trades cost for reliability. Run the same prompt three times, take the majority answer; a noisy ~80%-accurate classifier becomes roughly ~90% accurate as a majority-of-3 (Condorcet jury theorem — independent voters above 50% accuracy aggregate higher as N grows). The right pane runs ten rounds with a hand-seeded vote sequence — single-call lands ~77%, majority-of-3 lands 100% on that seed; over a longer run the gap converges on the Condorcet curve.
Two honest caveats. First, voting only helps when errors are independent and noisy — Condorcet's lift assumes the voters fail differently. If the model is systematically wrong on a class of input (same prompt, same blind spot), three identical mistakes still vote wrong; reach for diverse prompts, models, or temperatures if you want real independence. Second, voting costs N× the calls; reach for it only when individual calls are cheap and accuracy is the bottleneck.
Try it: Toggle Sectioning vs Voting in the right pane. In Sectioning, the parallel timeline finishes roughly 4× sooner than the sequential one (5.0s → 1.2s including the aggregator) — same total compute, lower wall-clock. In Voting, run several rounds and watch single-call vs majority-of-3 diverge: any single round is noisy, but over many rounds the majority is more reliable than any individual call.
Step 4 nests this idea — an orchestrator that itself dispatches to parallel workers, deciding the split at runtime instead of in advance.
Orchestrator-Workers + Subagents
A lead model receives a request, decides what subtasks are needed, dispatches them to worker calls, and synthesizes the results. The shape comes from Anthropic's Building effective agents post — same family as Step 3, but the split happens at runtime instead of in code.
How is this different from Step 3?
This is the paragraph that earns Step 4 its existence. Step 3's parallelization works when you know the split in advance — five sections of a paper, three voters on the same prompt, hard-coded. Step 4 is for when the orchestrator decides the split from the input. "Translate the README to French" needs one worker. "Competitive brief on Notion / Linear / Asana across pricing, integrations, and free-tier limits" might need nine. Same code path; different runtime topology. A fixed-N sectioning leaves workers idle on the short input and either drops subtasks or jams them onto one worker on the complex one.
When the dynamic split pays for itself
When the input shape varies enough that no static split fits all cases — research, competitive analysis, anything where "what's actually in this question" decides the work. Anthropic's multi-agent research system post reports their lead-researcher / specialist-subagents architecture lifts research-task quality substantially over a single-agent baseline, at the cost of roughly 15× the tokens of chat in their internal evaluations. The cost is real; so is the lift. Reach for orchestration when the value of the answer justifies the spend.
When it doesn't
Fixed-shape inputs — translation, summarization, classification, anything where the same DAG handles every request — are cheaper and simpler with predefined sectioning (Step 3) or a single well-prompted call. Don't reach for an orchestrator just because the topology sounds more capable; reach for it when the topology has to be decided.
Same primitive, different lesson
This module covers subagents as task-decomposition topology — how the orchestrator splits work across workers. Module 5 (Context Engineering) reframes the same primitive as context isolation — how subagents keep one another's intermediate state out of each other's window. Same machine, different lesson.
For a lifecycle taxonomy of subagents — persistent vs per-request, invoked-as-tool, etc. — Phil Schmid's Four Subagent Patterns is the right next read. That's the implementation primitive; this step is the workflow pattern that uses it.
Everything above keeps the orchestrator at the center: workers fan out, results fan back in. The next escalation is peer-to-peer — a shared task list teammates self-claim from, and a mailbox that lets them message each other directly without going through the lead. Anthropic's experimental Agent Teams in Claude Code is the canonical example, and that production-tier coordination layer (file locking, hooks for quality gates, terminal display modes) is covered in Track B: Multi-Agent in Production, not here.
Try it: Pick a request in the right pane. Watch the orchestrator's planning card — the number of workers it spawns changes per input (1, 5, 6, or 9). Then toggle compare to Step 3 sectioning to see fixed-N either leave workers idle (short input) or cram subtasks together (complex input).
Step 5 closes the named patterns with evaluator-optimizer — a tight critic-then-revise loop that often beats both bigger topologies and bigger models on accuracy per dollar.
Evaluator-Optimizer
What is the evaluator-optimizer pattern?
The last named pattern in Anthropic's Building effective agents post, and the only one with an actual feedback loop. A generator call produces a draft, an evaluator call critiques it, the generator revises — repeat until a quality bar is met or a budget is exhausted. Same model, two roles, one tight loop.
Why this pays
The split exploits an asymmetry. One-shot perfection is hard — the model has to land audience, tone, structure, facts, and brevity in one pass. Drafting alone is easier; critiquing a concrete draft is easier still, because the work shifts from "imagine the answer" to "inspect this artifact and name what's wrong." On tasks with a clear quality signal this consistently lifts over single-shot — Anthropic flags it as one of the highest accuracy-per-dollar moves in the post.
When this fails
The pattern is only as honest as the critic. If the evaluator is the same model with the same prompt and no extra signal, it can rubber-stamp the draft — two calls, one answer. The fix: give the evaluator something the generator didn't have — a checklist, a programmatic verifier, an external grader, or a stricter persona. Module 6 covers planning + reflection; Module 7 covers evaluators-as-graders. For now: a critic with no extra signal is theatre.
Termination
Every loop needs a stopping rule — quality threshold, max iterations, or budget. Without one the loop spins, or worse over-revises: each pass adds qualifiers until the output is mush. Three iterations is a defensible default; if quality is still climbing at three, the bottleneck is probably the prompt or the task framing.
Try it: Toggle No evaluator in the right pane. Three single-shot drafts wander. Toggle back to With evaluator — same task, same model, critic between drafts: scores climb 62 → 78 → 91. The shape of the line is the teaching, not the absolute numbers.
Module 3 wrap-up
Five steps, five named patterns: chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer. The thread tying them together is the Module 1 framing: the default is a workflow; an agent is an escalation when the path itself is the unknown. Most production work fits one of these five shapes (or a small composition). Next: Retrieval & RAG — the workflow primitive most agents reach for first.