What is compiled trajectory replay?

Compiled trajectory replay is PreAct's technique for letting a computer-using agent reuse a task it has already solved. It compiles a successful run — the full sequence of screen observations and actions — into a lightweight state-machine program of check-then-click steps, then replays that program directly with no per-step model call, for a reported 8.5–13× speedup on repeats.

Why is PreAct faster than a normal agent?

A normal ReAct-style GUI agent calls the language model once for every action, even on a task it has done before, and those per-step calls dominate the time. PreAct turns a repeated task into a program lookup: the recorded program runs the actions itself and only checks the screen at each step, so the model is out of the loop. The paper also reports 1.75–2.6 more completed tasks per benchmark across mobile, desktop, and web.

What happens when the screen changes and the program no longer fits?

PreAct validates the screen state before each recorded action, so a drifted layout or moved button is caught rather than misclicked. On a mismatch it falls back to fresh, model-driven exploration to solve that step, then continues. A program is also independently verified to complete the task before it is ever stored, so replay only runs vetted programs.

PreAct compiles agent runs into replayable programs — Compiled trajectory replay

Jargon

Computer-using agent (CUA): An agent that operates a real graphical interface — mobile, desktop, or web — by reading the screen and issuing clicks, taps, and keystrokes, rather than calling a clean API. PreAct is built for this setting, where every action is a screen interaction.
Trajectory: The full sequence of observation → action steps an agent took to finish one task — look at the screen, decide, click, look again. PreAct compiles a successful trajectory into something it can replay.
Per-step model call: Calling the language model once for every action the agent takes. In a GUI agent this is the dominant cost, and it repeats on every run — see the think → act → observe loop.
State machine: A program made of states (nodes) and transitions: each state checks a condition, fires an action, and moves to the next. PreAct distills a trajectory into one of these — the agent equivalent of a fixed workflow rather than an open-ended agent.
Screen-state validation: Before firing a recorded action, checking that the current screen matches what the program expects. It is what catches drift — a moved button, a changed layout — instead of blindly clicking the wrong place.
Store-time verification gate: An independent check that a compiled program actually completes the task before it is saved for reuse. A program that does not pass is never stored, so replay only ever runs vetted programs.
Fallback to exploration: When screen-state validation fails, PreAct hands control back to the model to solve the step live — the same retry-or-replan safety valve a robust harness keeps.
Memoization: Caching the result of an expensive computation so a repeat becomes a lookup. PreAct is the agent-trajectory analog of memoization — see result caching — except it caches the plan, not just the answer.

The news. On June 16, 2026, researchers released PreAct, a method that lets a computer-using agent speed up on tasks it has done before. It compiles a successful run into a lightweight state-machine program — screen checks plus click actions — and replays it directly, with no model call per step, for an 8.5–13× speedup. Each program is validated by an independent check before it is stored, and the method reports 1.75–2.6 more completed tasks per benchmark across mobile, desktop, and web; when no stored program matches, PreAct falls back to fresh exploration. Read the paper →

Picture an agent that just nailed a fiddly screen workflow — open the app, tap through three menus, fill a form, hit submit. It figured out every click by asking the model what to do at each step. The result was right, but watch where the time went: the slow part was never the clicking; it was asking the model what to click at every single step. Run the same task again tomorrow and a normal think-act loop pays the identical bill — another full sequence of model calls for a task it has already solved.

PreAct's move is to record that run as a screen macro. It compiles the successful trajectory into a state-machine program — a fixed list of check the screen, then click steps — and replays it straight through, with no model in the loop. The macro is not blind: before each action it validates that the screen matches what it expects, so it catches a moved button instead of clicking into the void. This is the agent version of memoization — but it caches the plan, not just the final answer.

A recorded macro is brittle the moment the screen drifts, so PreAct guards it at two points. Before a program is ever stored, an independent check confirms it actually completes the task — a bad recording never makes it into the library. And at replay time, every node revalidates the screen before firing; a mismatch hands control straight back to the model to solve the step live, then carries on. So the speedup only ever applies where it is safe, and an unfamiliar screen quietly reverts to exploration.

Mode	Model call per step?	Speed	When it runs
First run (ReAct-style)	Yes — one per action	Baseline	A new or unseen task
Replay (compiled program)	No	~8.5–13× faster (paper, varies by benchmark)	A task with a stored, validated program
Fallback	Yes	Baseline	The screen no longer matches the program

Put the cost on a clock. Say a workflow takes 12 GUI steps. The first time, the agent makes roughly 12 model calls — one per step — and each call adds latency and burns tokens. On a repeat, PreAct replays the compiled program: 12 screen-checks and 12 clicks, zero model calls. If the per-step model call is what dominates wall-clock time — which, for a GUI agent, it usually is — then collapsing those per-step calls toward zero is the main intuition behind the paper's 8.5–13× speedup (the 12-step figure is illustrative; the 8.5–13× is the paper's). What you buy back is not just speed: across mobile, desktop, and web benchmarks PreAct also completes 1.75–2.6 more tasks, because a vetted program runs the same way every time instead of re-rolling the dice on each step.

Goes deeper in: Agent Engineering → Cost & Latency Engineering → Result caching

Related explainers

CacheRL — cached rollouts for agent RL — the other end of the timeline: CacheRL caches tool results during training, where PreAct caches the trajectory at inference time
EvoMem — patch-based agent memory — also reuse at inference time, but it stores what the agent learned rather than a replayable program of what it did
AgentPerf — trajectory-replay benchmarking — replaying recorded agent runs as the unit of work, here to measure infra instead of to skip the model

Frequently Asked Questions

Check what you knowMap your AI & GPU knowledge across every track — free, role-based