State management for AI agents: a practical guide

State is the hardest part of production AI agents — not because it's technically complicated, but because "state" means three different things, and teams often conflate them until an incident forces them to untangle the mess.

This guide separates the three types of agent state, describes the four storage approaches, and explains which works at what scale.

The three types of agent state

1. Conversation state

The message history the model sees: user messages, assistant turns, tool call inputs and outputs. This is the context window. It's read on every LLM request and written after every turn.

2. Run state

The record of what happened in a specific agent run: which steps completed, which tools were called, what they returned, whether any retries occurred. Used for resuming interrupted runs and for post-run debugging.

3. Long-term memory

Information that should persist across separate runs: user preferences, past interactions, learned patterns. This is optional for most agents — not every agent needs to "remember" things — but when it's needed, it requires a different storage pattern than the first two types.

The four storage approaches

Approach 1: In-memory (development only)

State lives in a Python dict or similar in-process structure. Works for local development and single-user deployments. Fails immediately when you run more than one process, restart a server, or need to debug a failed run after the fact.

Use when: Local development, prototypes, demos.

Approach 2: Database-per-session (simple, common)

Each conversation or run gets a row in a Postgres table. State is serialized (usually as JSON) and read at the start of each turn, written at the end. Simple to implement, works at moderate scale, and gives you durable history you can query.

The downside: each turn involves a database read and write. At high request rates, this becomes a bottleneck. For agents with short turns (under 1 second), Postgres write latency is noticeable.

Use when: Production deployments up to a few hundred concurrent agents. Good default choice.

Approach 3: Redis for hot state, Postgres for cold

Active conversation state lives in Redis (fast reads, sub-millisecond latency). On session end, state is flushed to Postgres for durable storage. This pattern handles high concurrency without sacrificing durability.

The complexity: you need to handle the Redis-to-Postgres flush correctly. If a process crashes between turns, you need to recover from Postgres. You need Redis persistence configured (AOF or RDB) if you care about Redis crash recovery.

Use when: High-concurrency deployments, latency-sensitive agents, 1000+ concurrent users.

Approach 4: Event-sourced state

Rather than storing the current state, you store every event that produced it: "user message received," "tool called," "tool returned," "model responded." State at any point is computed by replaying events.

This is powerful for debugging — you can replay any run exactly as it happened — but expensive to implement and overkill for most use cases. Suited for high-audit environments (financial, legal, medical) where you need a complete, immutable record.

Use when: Audit requirements, compliance use cases, when replay is a first-class need.

What state shape looks like in practice

A practical conversation state object for a production agent looks like this:

{
  "session_id": "sess_abc123",
  "user_id": "usr_xyz",
  "created_at": "2025-03-15T10:22:31Z",
  "updated_at": "2025-03-15T10:24:17Z",
  "messages": [
    {"role": "user", "content": "...", "timestamp": "..."},
    {"role": "assistant", "content": "...", "timestamp": "..."},
    {"role": "tool", "name": "search", "input": {...}, "output": {...}}
  ],
  "run_metadata": {
    "model": "gpt-4o",
    "total_tokens": 1842,
    "tool_calls": 3,
    "retries": 1,
    "wall_time_ms": 4200
  }
}

The key insight: run_metadata is not just nice-to-have. It's how you answer "why did this agent run cost so much?" and "why did this turn take 4 seconds?" in a post-mortem.

The state isolation problem

In multi-user deployments, state isolation is non-negotiable. User A's conversation state must never be readable by User B, must never leak into User B's agent context, and must never be corrupted by a bug in User B's run.

This sounds obvious but it fails in subtle ways. A shared cache key collision. A Redis key without proper namespacing. An ORM that returns the wrong session due to a missing tenant filter. Any of these can cause state cross-contamination that's extremely hard to debug after the fact.

The pattern: session IDs that include the user ID as a prefix (e.g., usr_xyz:sess_abc123) make accidental cross-tenant access immediately visible. Filter by user ID in every query, not just in the session lookup.

Agent Chassis manages all three types of state with proper isolation by default. External storage, Redis-backed hot state, and full run metadata on every turn. Get the framework →

State management for AI agents: a practical guide

The three types of agent state

1. Conversation state

2. Run state

3. Long-term memory

The four storage approaches

Approach 1: In-memory (development only)

Approach 2: Database-per-session (simple, common)

Approach 3: Redis for hot state, Postgres for cold

Approach 4: Event-sourced state

What state shape looks like in practice

The state isolation problem

State management, built in.