Initialization & Handoff

Every agent session has two bookends, and most failures attributed to "the model lost the thread" actually live in them. Initialization is how a session starts from a verified, unambiguous state instead of re-deriving the project from scratch; handoff is how it ends leaving that state behind for the next one. Treat the agent as a competent engineer whose short-term memory is wiped between sessions: everything it needs to resume must be on disk, and everything it leaves behind must be clean.

A new session that spends fifteen minutes rediscovering how to run the tests pays that cost every single session — and redundant re-diagnosis eats 30–50% of session time in un-instrumented setups. Good bookends compress it to minutes.

15 → 3 min

rebuild cost per session, before vs. after persistent state

questions a fresh session must answer from the repo alone

clean-state conditions before a session may end

feature code written during initialization

Initialization is its own phase

Setting up infrastructure and building features are different kinds of work with different optimization targets — and an agent asked to do both at once reliably favors the visible one (feature code) over the invisible one (infrastructure). So the harness makes initialization a dedicated first phase whose output is infrastructure, not business code, ideally starting from a template rather than an empty directory:

A runnable environment

Dependencies installed, the project starts, setup succeeds from scratch — proven by running it, not by assuming it.

A verifiable test rig

At least one passing example test — evidence the test framework itself is wired, so every later claim of green means something.

The operating record

A readiness doc (how to start, verify, where things stand), an ordered feature list with acceptance commands, and a clean git checkpoint all later work builds on.

Not extra cost — upfront investment. Dedicated-initialization projects recover the setup time within a few sessions and complete measurably more of the feature list across a multi-session run.

The acceptance test for the phase is the fresh-session test: open a brand-new session with nothing but the repo, and it must be able to answer — What is this system? How is it organized? How do I run it? How do I verify it? Where are we now? If any answer requires a human or a Slack thread, initialization isn't done. (This is the system-of-record principle applied to time: the repo carries the project not just across services, but across sessions.)

Effective harnesses for long-running agents ↗

Anthropic Engineering·2025·Blog

Anthropic makes initialization a dedicated agent in its long-running-agent harness: an initializer sets up the environment, git repo, and init scripts, and writes a structured feature list and progress log before any feature work begins. Compaction alone proved insufficient for cross-session continuity — durable on-disk state had to be a first-class setup phase, not a side effect of the first coding session.

Clock in, clock out

Between the bookends, the working rhythm is a pair of routines the harness runs every session — the same way an engineer reads the standup notes before coding and commits before leaving:

Two files carry the thread. A progress file holds current state — latest commit, test status, what's done, what's in flight, ordered next steps. A decisions file holds the why: what was chosen, what was rejected, and the constraint behind it. The distinction matters because compaction and session resets preserve the what (the code survives) while silently dropping the why — and a next session that doesn't know Redis-over-materialized-view was a deliberate choice will happily "optimize" it away. Git commits at each atomic unit complete the set: free, versioned state snapshots.

Leave a clean state, every time

Entropy is the default. Agents copy whatever patterns exist — including the debug cruft and broken tests the last session left — so drift compounds: in tracked comparisons, build pass rates decay from 100% toward the 60s over a quarter without exit discipline, while startup time balloons. The exit gate is non-negotiable, and missing any condition means the session is not done:

Session exit checklistall five, every session

Build passesthe project compiles — no "it was green when I started" handoffs

Tests passthe suite is green, or the failure is recorded as the active task

Progress recordedprogress and decisions files reflect reality as of this commit

No stale artifactsno debug prints, commented-out blocks, or orphaned temp files left behind

Startup path worksthe documented run command still starts the system

Session integrity is a transaction: commit fully to a clean state or roll back to the last one — no middle ground. Cleanup scripts should be idempotent, so a crashed exit can simply run again.

The handoff this produces is what makes everything upstream work: durable resume assumes the journaled state is trustworthy, the next session's initialization assumes the checkpoint is clean, and the feature list assumes passing still passes. Always ready to hand off is the steady-state goal — at any moment, the repo alone should let a fresh agent take over without a word of verbal explanation.

Pitfalls

Mixing init into the first feature — the multi-objective blend reliably shortchanges infrastructure; session two then pays for it in implicit-assumption landmines (session 1 picked Vitest, session 2 unknowingly adds Jest).
Persisting the what but not the why — progress files without decision records produce next sessions that re-litigate settled choices. The Amnesiac Agent isn't only about forgetting facts; it forgets reasons.
"Clean up later" — later never arrives, and the next session inherits — and imitates — the mess. Cleanup is part of this session's definition of done.

Initialization is its own phase​

Clock in, clock out​

Leave a clean state, every time​

Pitfalls​

Initialization is its own phase

Clock in, clock out

Leave a clean state, every time

Pitfalls